Close Menu
    National News Brief
    Friday, June 12
    • Home
    • Business
    • Lifestyle
    • Science
    • Technology
    • International
    • Arts & Entertainment
    • Sports
    National News Brief
    Home » Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

    Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

    Team_NationalNewsBriefBy Team_NationalNewsBriefJune 12, 2026 Business No Comments4 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Anthropic on Tuesday launched Claude Fable 5, its most capable public model. But within two days, users began reporting that its safety system was blocking benign or legitimate prompts.

    Fable 5 is the first public model derived from Anthropic’s Mythos family, whose original iteration showed unusual skill during training at finding software bugs and exploiting them to disrupt or take control of systems. That raised enough concern inside Anthropic that the company grouped cybersecurity with other high-risk domains, including biology and chemistry, when setting limits on Mythos-derived public models.

    For Fable 5, that means prompts flagged as sensitive in those areas are routed to Claude Opus 4.8, a less capable model with its own guardrails. Anthropic says the fallback affects about 0.05% of queries and notifies users when it happens.

    But reports of false-positive reports quickly mounted. That’s because Anthropic erred on the side of caution when it designed the classifiers used to detect and downgrade potentially dangerous uses of its model. It was also challenged to balance accuracy with transparency.

    Try telling that to developers. Across social media, people have complained about Claude Fable 5 rejecting queries about everything from RNA sequencing data for sheep to résumé editing, to shopping lists. 

    “The word ‘cancer’ is flagged as a biosecurity risk by Claude Fable 5!” said scientist Derya Unutmazon on X. “Our Anthropic overlords deciding which prompts the peasants are allowed to use.,” added founder and developer Bojan Tunguz on X.

    Anthropic now says it’s working on the problem. “A hidden safeguard is harder to probe and work around,” Anthropic says in a statement emailed to Fast Company. “This means the safeguards can be targeted much more narrowly. A visible safeguard needs to cast a wider net to be more robust, resulting in more requests being incorrectly flagged.”

    “We made the wrong tradeoff and we apologize for not getting the balance right,” the company adds. 

    Now Anthropic says it’s working to refine the classifiers so that less queries trigger false positives. For Claude subscribers, query downgrades (to Opus 4.8) will be more obvious. Developers accessing Fable 5 via the Claude API will see a reason for the model’s refusal of a prompt, the company says. 

    Meanwhile, at least one AI researcher appears to have coerced Fable 5 into responding to a banned prompt. Pliny the Liberator claimed on X to bypass Fable 5’s filters roughly 24 to 48 hours after launch. Pliny described using a multi-agent approach involving a previously jailbroken Claude Opus 4.8, along with techniques including query decomposition, long-context framing, fiction and narrative structures, and academic taxonomies. 

    “We got some cyber, some chem, some psychological manipulation, and some good ol’ fashioned explosives!” Pliny tweeted, describing some of the banned output he was able to elicit from Fable 5.

    Anthropic believes Pliny’s claims are much ado about not much. “In our review of the circulating screenshots, two of the four were not generated by Fable at all, and the Fable outputs contained only general information already available in public sources and did not provide meaningful uplift toward real-world harm,” a company spokesperson said in an email to Fast Company after our story had published. “A broader review of recent activity found no instance in which our safeguards were bypassed to produce genuinely harmful content,” the spokesperson added.

    Before launch, Anthropic said more than 1,000 hours of internal and external red-teaming, including bug bounty efforts, had identified no universal jailbreaks. The company has acknowledged that preventing all sophisticated, multi-turn, or agentic attacks is likely not possible and says it continues to refine its classifiers.

    Updated 6/11 8PM ET: Added Anthropic comments about Pliny the Liberator’s claims.



    Source link

    Team_NationalNewsBrief
    • Website

    Keep Reading

    SpaceX IPO update: Latest SPCX stock price, trading start time for closely watched Nasdaq debut

    The 2026 World Cup is here, and so are the germs. This virus is experts’ No. 1 concern

    Neurobiologists say this one simple lesson can help you lead more effectively

    Why being lazy is a superpower

    Forget FAANG—there’s a new powerhouse acronym for tech stocks in the AI era: MANGO

    Companies are spending on Pride again—but not like they used to

    Add A Comment

    Comments are closed.

    Editors Picks

    Stocks slide, dollar advances after Trump speech on Iran war

    April 2, 2026

    Opinion | Pete Hegseth Is the Secretary of Defense We Deserve

    January 15, 2025

    The free-energy principle: Can one idea explain why everything exists?

    October 20, 2024

    Hershey’s Electric Railway in Cuba

    May 3, 2026

    Still No Justice For COVID Nursing Home Deaths

    May 2, 2025
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    About us

    Welcome to National News Brief, your one-stop destination for staying informed on the latest developments from around the globe. Our mission is to provide readers with up-to-the-minute coverage across a wide range of topics, ensuring you never miss out on the stories that matter most.

    At National News Brief, we cover World News, delivering accurate and insightful reports on global events and issues shaping the future. Our Tech News section keeps you informed about cutting-edge technologies, trends in AI, and innovations transforming industries. Stay ahead of the curve with updates on the World Economy, including financial markets, economic policies, and international trade.

    Editors Picks

    61% Of Israelis Against Netanyahu

    June 12, 2026

    Nicole Kidman Reacts To Taylor Swift’s Knicks-Inspired Tribute Shirt

    June 12, 2026

    US authorities investigate huge ‘8647’ marking on grounds of National Mall in Washington

    June 12, 2026

    Stock markets surge as Trump calls off strikes on Iran, touts peace deal | Financial Markets

    June 12, 2026
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Nationalnewsbrief.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.