I've had a chance to test Fable and I can confirm that it's as good as the model card implies. As I speculated a while back in some thread or another, it's a step change similar to Sonnet -> Opus....
Exemplary
I've had a chance to test Fable and I can confirm that it's as good as the model card implies. As I speculated a while back in some thread or another, it's a step change similar to Sonnet -> Opus. When you consider what Opus 4.8 is capable of, that's significant.
There are going to be a lot of hot takes, just like there are with every frontier model release, and most of them are going to be wrong, just as they've been with every model release.
One of the most popular hot takes was that project glasswing was disingenuous marketing. That being when Anthropic released Mythos to a limited selection of companies and government groups that were deemed critical to digital infrastructure. The takes continued even after people who had access started saying that this thing was legit. Not suprising, there are big feelings and huge interest in this topic, hot takes generate a lot of clicks, engagement and catharsis.
Well, having tested Fable (the name for the Mythos public release) on a large and fairly complex codebase, it wasn't empty hype. Its ability to connect the dots between disparate functionality, reason about it, and follow the chain of cause and effect to impressive depth, would almost definitely translate to unprecedented cyber capability. All the people who said Mythos was legit weren't lying.
Some things that stand out to me so far: Fable has much broader/deeper knowledge of software architecture which translates to better "taste". Still far from perfect but a significant step up. Second, it catches things that were functionally invisible to smaller models (because Fable is almost definitely larger than anything previously released, fine tuning alone couldn't accomplish this). Because it "sees" more, it can do more things that previously required human expertise.
Another popular set of hot takes in the other direction is that these models are now so good that they'll kill software engineering as a career, replace all knowledge workers, advance to AGI sooner rather than later, and so on. I don't think that's true. Fable is good but it's still a dumbass in a lot of ways. That's baked into the core of LLM technology. It's better at everything but it's still only mimicking reasoning and understanding, which leads to all sorts of mistakes. There's no actual awareness anywhere in the loop. The illusion thereof is remarkable sometimes though.
The last bit is concerning, Fable is another step up in the ability of these models to fool people who can't fluently read the code themselves into thinking the code quality is far better than it is. And it will lead to a new wave of engineers convincing themselves they don't need to examine the output anymore. Who knows exactly how that will ultimately play out. I just know it makes me uncomfortable.
Yet another set of hot takes centers around the idea that advancement has or will soon plateau, at which point the bubble will burst and it will all come crashing down. Adjacent to this is the idea that the frontier companies will never be able to make a profit on inference, which is the key area where profitability will matter in the long run. This serving as evidence of the impending fall. But the thing is that there is no credible public information about inference profit and loss at these companies at all. It's all wild speculation. Except we do know exactly how much inference costs for open models, and there is very clearly a lot of room for profit. Personally I suspect that Anthropic and Open AI are already making a profit on inference and have been for some time. Anthropic likely for most of a year and Open AI since (at the latest) GPT 5.5 when they significantly raised prices. It's even possible that the current prices are fully sustainable. Moreso when you factor in that inference costs are almost definitely going to continue to come down (relative to utility). Again that's something we can observe with open models and there's still a lot of room for innovation and optimization in a new tech like this. The open question is whether or not they'll ever be able to make enough profit to justify the astronomical investment and valuations.
As far as a plateau goes... Seems reasonable to assume that will happen at some point but so far it keeps stubbornly not happening.
In case someone gets the wrong idea, I'm not attempting to defend these companies, I very much wish that LLM technology wasn't currently led by trillion dollar companies and their tech elite executives. I wish all the success to the open models, which will be the fallback if (when) the big labs start to enshittify or the bubble bursts. The latter being something else that keeps stubbornly refusing to happen.
Fantastic comment. Great to hear from somebody using these models firsthand. To add a bit of nuance, I think a further open question is if they’ll be able to create a moat around model...
Fantastic comment. Great to hear from somebody using these models firsthand.
The open question is whether or not they'll ever be able to make enough profit to justify the astronomical investment and valuations.
To add a bit of nuance, I think a further open question is if they’ll be able to create a moat around model development. The issue around the profitability of LLMs is less about demand (it’s clear that’s high) and more about competition.
The highly profitable software companies have all had very high user lock-in. Many of the products are free, available by default, and benefit from network effects. It’s not clear yet if any of these will apply to LLMs. Instead, the market gets a plethora of LLMs to choose from, and the concern is that the capabilites of Fable might be imitated by the open-source models in a couple years or less. If they do, companies will switch vendors.
Without a moat, competition will drive margins low enough so these AI companies never justify their valuations. Software VCs assume their game is winner-take-all, which has consistently been justified. If that assumption fails, that’s when the bubble will burst.
Yes, thanks for adding that. Currently the moat is velocity, none of the cheap/open alternatives have been able to get close enough to the frontier to tempt the majority of users. And velocity is...
Yes, thanks for adding that. Currently the moat is velocity, none of the cheap/open alternatives have been able to get close enough to the frontier to tempt the majority of users. And velocity is expensive... but it can work as a moat as long as there's plenty of frontier available. If they hit a plateau everyone will probably catch up.
I've been wondering if Anthropic started their IPO process when they did in order to go public when they had a clear lead. Opus 4.8 put them pretty far ahead, Fable/Mythos just makes it undeniable. For now at least.
My exposure to AI related media is admittedly pretty sporadic and narrow, but to me it seemed like after the dust had settled, the general sentiment (Krebs/Schneier on Security, Daniel Stenberg...
One of the most popular hot takes was that project glasswing was disingenuous marketing. ... The takes continued even after people who had access started saying that this thing was legit.
My exposure to AI related media is admittedly pretty sporadic and narrow, but to me it seemed like after the dust had settled, the general sentiment (Krebs/Schneier on Security, Daniel Stenberg and other high profile open source maintainers) was that Glasswing was fine, but not necessarily any better than any of the other models that were already out there and available to everyone, and that the marketing was pretty over-hyped.
Now that Mythos is public in the security hobbled form of Fable, we don't have to speculate. In my testing yesterday Fable found two legitimate vulnerabilities that previous models (and I) had...
Now that Mythos is public in the security hobbled form of Fable, we don't have to speculate. In my testing yesterday Fable found two legitimate vulnerabilities that previous models (and I) had missed. And that was in non security focused scans (because most security related prompts currently get downgraded to Opus 4.8). In both cases they were subtle issues that were easy to miss.
It's true that models like Opus 4.8 and GPT 5.5 can be wrangled to find a lot of security issues. In the hands of a decent engineer, with a good harness, you can use either of those models to find and patch or exploit all sorts of vulnerabilities. It's an iterative process though. According to Anthropic, the reason for the controlled release was because Mythos is better at chaining vulnerabilities into working exploits on its own. It would allow anyone, including non-engineers, to find and exploit holes in widely used software. Glasswing gave those companies a chance to patch many of the holes in advance.
I don't have access to unrestricted Mythos, just Fable, so I can't test the full extent of the capabilities Anthropic is claiming. But seeing Fable's capabilities in other areas of coding I have no doubt they're telling the truth. It's significantly better at putting pieces together into a working thesis and then following it through to a given conclusion, which would definitely generalize into security research.
That said, I don't think project glasswing was wholly altruistic. When they released Mythos they didn't have anywhere near enough available compute to handle a wide release. So while the safety angle was legitimate, it also served their purposes to do a limited release while they scrambled to find the compute for a full scale release. And yeah, the hype didn't hurt either. But the aforementioned hot takes that it was all smoke and mirrors are now demonstrably false.
I work on a code base that has been scrutinized by Mythos. It identified three issues. One was real (but extremely unlikely to ever be exploited in any meaningful way). The other two were mostly...
I don't have access to unrestricted Mythos, just Fable, so I can't test the full extent of the capabilities Anthropic is claiming. But seeing Fable's capabilities in other areas of coding I have no doubt they're telling the truth.
I work on a code base that has been scrutinized by Mythos. It identified three issues. One was real (but extremely unlikely to ever be exploited in any meaningful way). The other two were mostly bunk and based on misleading code comments. How code is written and how it's used don't always align.
I'm sure Mythos is finding more legitimate vulnerabilities, but it's also increasing my workload because a 33% hit-rate requires a lot of human scrutiny...
If anyone else in science is confused by how astonishingly broad the safety blocks are on Fable: while some summaries talk about safeguards being broader and against misuse, the model card and...
If anyone else in science is confused by how astonishingly broad the safety blocks are on Fable: while some summaries talk about safeguards being broader and against misuse, the model card and deeper within this linked page essentially explain that everything remotely related to biology or chemistry is blocked. It isn't like a typical model, where messages deemed high risk are blocked: the model is simply completely useless in those fields.
Every single query I tried with Fable has been blocked so far; even something as simple as the convex optimization routine in a chemical equilibrium solver or as abstract as a tile assembly model simulator. It triggers early enough that it seems to be triggering on even individual words like "concentration", "equilibrium", or "dimer".
While not related to my use, for safety reasons, too, of course, Anthropic also discloses that they will silently block, degrade, or secretly modify prompts for usage that appears to be trying to develop any competition to them.
This is particularly frustrating because I expect that a number of other people, particularly at large corporations, or groups connected to the US government, do have access to Mythos. They may be people submitting to the same journals and conferences. But as a scientist in academia, I doubt my group will have access any time soon, if at all.
That's interesting hands on experience thanks for sharing... As soon as I saw the 5% refusal rate in the model card I knew it was going to be a problem. 5% refusal across all queries translates to...
That's interesting hands on experience thanks for sharing... As soon as I saw the 5% refusal rate in the model card I knew it was going to be a problem. 5% refusal across all queries translates to near total refusal for the subset they've targeted (cybersecurity, microbiology and chemistry).
I can confirm that the refusal rate (or actually Opus downgrade rate) for cyber is extemely high and sometimes it falsely flags things that have nothing to do with security.
My read is that they erred dramatically in favor of refusal just to get the release out the door and then will relax restrictions incrementally once they have more usage data and testing.
I'm guessing so. I was surprised initially because the error message is not well written (it's missing chemistry, and suggests that 'normal, safe' content might be falsely flagged, when the model...
My read is that they erred dramatically in favor of refusal just to get the release out the door and then will relax restrictions incrementally once they have more usage data and testing.
I'm guessing so. I was surprised initially because the error message is not well written (it's missing chemistry, and suggests that 'normal, safe' content might be falsely flagged, when the model card makes clear they're just blocking all content in the banned fields, not making any classification about safety). And it's a very different sort of refusal than I've experienced with essentially all other models, which was both unexpected and makes me think it is much simpler, making it significantly harder to trick and easier to implement at the cost of being broader. I wouldn't be enormously surprised if it's doing something like keyword searches across input+output+thinking.
I've been sorta waiting for this, and suspect some of it isn't anything to do with the new model so much as "people in the companies are being asked uncomfortable questions by authorities". Search...
everything remotely related to biology or chemistry is blocked.
I've been sorta waiting for this, and suspect some of it isn't anything to do with the new model so much as "people in the companies are being asked uncomfortable questions by authorities".
Search engines/websites basically went through the same process of "oh this is neat" to "oh cool someone explained how to make wildly dangerous thing" to "okay we need some attempt at limiting access to wildly dangerous thing"
Chemistry was the big one I was waiting for, as I've had it explained before that there are materials that the average person could make if given a simple step by step guide and some very common precursors, so they do their best to make sure such guides are not just easily available (not that they stop everyone but at least some curious/angry teenager).
If you go through X years of chemistry and get the right clearances/jobs it becomes obvious how easy some of this stuff could be to make, but that's at least somewhat "controlled" by weeding out people who don't get across that line, and the general hope such people wouldn't do such a thing.
Obviously you can't stop everything (meth is an easy example), but I've at least heard some anecdotes of "huh it probably shouldn't be allowed to give that answer, god knows a google search wouldn't".
Certain parts of physics and GPS I heard was blocked from the start, even obliquely, but I believe some of that is because it's also easier to identify problematic questions (and they're more of interest to state actors than individuals).
That said i'm a little surprised by biology and wonder how much that wanders into the other side of the coin which is less about public safety and more about copyright. We can spin theories all day (could just be you can't block chemistry without blocking biology), but I wouldn't be totally shocked if some pharma exec screwed around with it and got "yeah it just guessed proprietary drug X's formulation, get my senator on the phone". I think that's more dramatic than I intend (these things don't really need to be secret as they have legal enforcement), but I do wonder if there were conversations in that realm.
It's a big question back on the copyright block when dealing with countries that don't give a fuck. If an AI can just figure out the few things you didn't get stolen/leaked, then how do you compete?
On the other hand, I have an optimizer applet for a tycoon game that Claude coded, and I asked Fable to improve it and it gave me a bunch of improvements and no blocking. I turned off switching to...
On the other hand, I have an optimizer applet for a tycoon game that Claude coded, and I asked Fable to improve it and it gave me a bunch of improvements and no blocking. I turned off switching to another model if progress is blocked. In fact, I was switching to sonnet for some of the simpler prompts and they were still getting submitted to Fable.
Certainly not something as exploitable, and it's in the realm (programming) that Fable is nominally intended for, but it has worked fine for me.
It's a problem specifically with scientific work vaguely connected to chemistry or biology. I use Claude Code for programming scientific packages, and all these blocks were caused by prompts...
It's a problem specifically with scientific work vaguely connected to chemistry or biology. I use Claude Code for programming scientific packages, and all these blocks were caused by prompts around coding. That's part of the frustrating thing: it appears that the model is unusable for any of my projects, even for parts that aren't connected to the science directly, because the projects themselves use the terminology of molecular interactions. A friend of mine is using it for some CS theory work right now without too much trouble.
A few of us have been having some discussions about whether, if I changed the terminology used in some of the packages, they might not be blocked. It's unlikely to be worthwhile, but it's possible it would work: the classifier seems both very separate from the model and much simpler than the model itself, unlike some other safeguards; it seems to just be tripped by the terminology itself.
The problem tends to be that the classifiers are so broad that, in some cases, what I'm trying to do would not be considered chemistry, but the mere presence of chemistry in a project is enough to...
The problem tends to be that the classifiers are so broad that, in some cases, what I'm trying to do would not be considered chemistry, but the mere presence of chemistry in a project is enough to block any work.
In one case, my problem is with the implementation of a numerical optimization algorithm for a mostly-convex optimization problem. That the project more broadly uses the words "equilibrium", "concentration", and "energy" seems to be enough to trigger the classifier, even when prompted to look directly at the optimization problem.
A public LLM of some kind would be amazing, not just for science, for the whole range of applications. Don't underestimate the size of the task though. It would need nation state level funding,...
A public LLM of some kind would be amazing, not just for science, for the whole range of applications.
Don't underestimate the size of the task though. It would need nation state level funding, and success would hinge on convincing the right experts to get on board.
The EU could maybe pull it off, it would make a lot of sense for them given their recent distaste for US tech.
I imagine various people in government and academia have at least talked about it by now.
I was thinking in terms of something homegrown, anyone could of course use an open model as a starting point, but they'd then be reliant on that provider since they wouldn't have their own...
I was thinking in terms of something homegrown, anyone could of course use an open model as a starting point, but they'd then be reliant on that provider since they wouldn't have their own training pipeline. For a government or academic solution they'd ideally start from scratch.
But yeah the chinese open weights models are pretty good and it's great that they exist for all sorts of reasons.
It’s an interesting thought, but with what funding? Even excluding the knowledge and skills necessary, the hardware requirements alone probably already halt this idea. Edit: Not even necessarily...
It’s an interesting thought, but with what funding? Even excluding the knowledge and skills necessary, the hardware requirements alone probably already halt this idea.
Edit: Not even necessarily the GPUs alone are a bottleneck, just (ideally redundant) storage for the raw training data would already be prohibitively expensive for most entities or orgs entertaining this idea…
You would be shocked at the amount of hardware most large universities already have in their research closets. Internet2 is about cutting edge internet research The people in the industry now were...
You would be shocked at the amount of hardware most large universities already have in their research closets.
Mea culpa, that’s a great point. If the goal was a very large, general-purpose model, I imagine research that’d come out of potentially resource-constrained environments [in comparison to the “AI”...
Mea culpa, that’s a great point.
If the goal was a very large, general-purpose model, I imagine research that’d come out of potentially resource-constrained environments [in comparison to the “AI” big players] would focus on algorithmic and size optimization, regardless. It’s still cutting-edge after all, just potentially at a smaller scale and/or helping achieve smaller hardware requirements.
(As an aside, I’m a little embarrassed, but I just now realize the same is actually true for my alma mater: they have both the huge data/computing center as well as a nuclear reactor, too – although I’m not sure how much free computing capacity could be scheduled there towards an intensive task like (very) large language model training.)
If any given university had been given a grant or donation to the tune of 1/10th of any one of Anthropic's funding rounds, they'd be able to scale quickly too. Anytime people tell me universities...
If any given university had been given a grant or donation to the tune of 1/10th of any one of Anthropic's funding rounds, they'd be able to scale quickly too.
Anytime people tell me universities can't do what industry does because of funding, I remind them that if we properly taxed wealth and income, they too would have access to multiple billions at a whim.
That's somewhat annoying, for me the most intriguing parts in the announcement are the ones about biology and drug design, so it's a shame there's no way for anyone to independently test those
That's somewhat annoying, for me the most intriguing parts in the announcement are the ones about biology and drug design, so it's a shame there's no way for anyone to independently test those
I'm in a similar situation. The worst failures of Opus, for me, have been from its tendency to hallucinate completely bogus scientific ideas while working on my code, and then silently put them...
I'm in a similar situation. The worst failures of Opus, for me, have been from its tendency to hallucinate completely bogus scientific ideas while working on my code, and then silently put them throughout implementations. I was hopeful that the improvements in scientific understanding would be helpful.
$10/MTok in, $50/MTok out I'm sure all these enterprises will be thrilled to see their spend double overnight. Probably more because Fable probably churns twice as fast as well.
$10/MTok in, $50/MTok out
I'm sure all these enterprises will be thrilled to see their spend double overnight. Probably more because Fable probably churns twice as fast as well.
The amount of usage that this thing burns is… Something else. And how they’re dealing with it is a head scratcher for me: I understand the idea of them being in an IPO and needing to create hype....
The amount of usage that this thing burns is… Something else. And how they’re dealing with it is a head scratcher for me:
From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
I understand the idea of them being in an IPO and needing to create hype. But, in an ambient where we’re starting to notice cold feet around AI, where people like Scam Altman are starting to backtrack on what they said, where these companies are starting to accept the reality of how their business model was unsustainable and raise prices… Doesn’t this just make them look worse?
Wouldn’t working on optimizations and deliver the same quality for lower compute be a more sensible choice? I’m not even talking about pragmatism per se - there’s nothing pragmatic about the current market situation - I’m talking about how an AI company like Anthropic would create more investor confidence if they delivered a model just as good as sonnet but runs in cheaper hardware, and so they could promise that they have a path to profitability.
But instead - even though they aren’t profitable as afaik don’t have a clear plan - they deliver an even more expensive model…?
But… Meh, what do I know.
Edit: Yeah, I don’t see anyone paying these prices
Pricing for both models is $10 per million input tokens and $50 per million output tokens. Developers can use claude-fable-5 via the Claude API.
I expect that's exactly what we see with the updates to lower-tier models. For example, distillation of Mythos/Fable might well be used to make Opus or Sonnet 4.9. It would be the opposite. Models...
Wouldn’t working on optimizations and deliver the same quality for lower compute be a more sensible choice?
I expect that's exactly what we see with the updates to lower-tier models. For example, distillation of Mythos/Fable might well be used to make Opus or Sonnet 4.9.
AI company like Anthropic would create more investor confidence if they delivered a model just as good as sonnet but runs in cheaper hardware
It would be the opposite. Models that run 'on cheaper hardware' are exactly the ones whose price on openrouter approaches the bare electricity cost. Anthropic and OpenAI need to support their valuations with seemingly unique frontier capabilities, sold at a price that at least recovers operational costs. The implied business model is that if you come for the frontier models you'll stay for everything else; Claude Code or Claude Whizbang Automation will be happy to spend tokens on other, less-frontiery tasks at more profitable price points.
Edit: Yeah, I don’t see anyone paying these prices
With enough reputation and reliability, people will. If you could truly double the productivity of an engineer or developer, even $10k/month in token cost would be worth it to many large companies.
The problem is that doubling developer productivity isn’t going to result in doubling of actual output or value creation in many circumstances. It’s often not the developers that are the...
The problem is that doubling developer productivity isn’t going to result in doubling of actual output or value creation in many circumstances. It’s often not the developers that are the bottleneck, even if they are perceived as the bottleneck.
Who gets the blame, in the eyes of the people who cut the checks: A: Functional people who didn't start testing until day before a deadline. B: Programmer and ops mad dashing to fix critical error...
Who gets the blame, in the eyes of the people who cut the checks:
A: Functional people who didn't start testing until day before a deadline.
B: Programmer and ops mad dashing to fix critical error functional person found at 4:50 day of deadline
C: Manager of A
D: Manager of B
While nobody is blameless in this scenario, I'm fairly certain that everybody will point their fingers at B, "Because B is a business expense and A brings in the money."
As long as they're supply-constrained due to datacenter capacity and there are customers willing to pay, they make more money by charging more. It's economics 101. This is expense-account pricing,...
As long as they're supply-constrained due to datacenter capacity and there are customers willing to pay, they make more money by charging more. It's economics 101.
This is expense-account pricing, but I assume that there will be businesses willing to pay more than I will as a hobbyist. Meanwhile, I'll stick with the Chinese models I've been using.
I imagine prices will eventually come down due to increased supply, algorithmic improvements and competition.
The variable you're missing is how many customers were sold on "and this will replace X number of workers for Y spend a year". If the AI spend is > Y, well it's back to the horrors of human labor,...
As long as they're supply-constrained due to datacenter capacity and there are customers willing to pay, they make more money by charging more. It's economics 101.
The variable you're missing is how many customers were sold on "and this will replace X number of workers for Y spend a year".
If the AI spend is > Y, well it's back to the horrors of human labor, and I'm already aware of conversations around that.
It's worth noting that to a company an employee is more expensive than their yearly take home as there's things like benefits and health care, but I'm seeing a lot of reassessment of if any of this is going to pay for itself.
Personally, it's what I expected, as I still see this all as if you just launched Photoshop yesterday for free. Everyone would use it. Now we're at the point where they say "okay it's $100 a user a month" and suddenly it's clicking for management that's just not cheaper than using the clip art they were before.
The other thing i'm seeing a lot of is that AI isn't always a sustained need? A lot of coding frustration comes from when you're learning something new or dealing with something you haven't dealt with before (especially when rtfm leads to outdated docs.....). Once you're over that hurdle and have your patterns, if you're remotely good at documentation/abstraction/libraries....well now you've got your reusable code.
I'm also seeing cases where "oh we'll just have the AI do that" became "Yeah what did that nerd say about just writing down the code and running it ourselves".
They have to be aware of all of this and pricing accordingly, but like photoshop I think it'll go from "oh everyone uses that" to "yeah I know someone who's job/expensive hobby requires that".
Edit-
I will add another variable that's very important to all of this is WHERE it hits the expenses. Companies are often laser focused on payroll and a hell of a lot less concerned about other expenses, especially technical ones which upper management sometimes doesn't even understand. I've frequently seen very high consultant bills get rubber stamped instead of a hire that's a fraction of the cost because of what reports the expenses go on and how they're rolled up.
Obviously bottom line is bottom line, but I do think that if AI can be "close" to employee cost it'll stay hidden elsewhere at some companies because payroll down and profits up! will be the message, even if digging into the numbers you likely would've had higher profits and lower expenses had you kept payroll up, and profits went up due to other factors.
Wasn't there recently an article about AI companies starting to be profitable because of different agent pricing models for end customers (subscriptions) and enterprises (usage tokens only)? I'm...
But instead - even though they aren’t profitable as afaik don’t have a clear plan - they deliver an even more expensive model…?
Wasn't there recently an article about AI companies starting to be profitable because of different agent pricing models for end customers (subscriptions) and enterprises (usage tokens only)? I'm an end user and don't have a Claude subscription, so I never checked.
Also I think there's a good chance that in the background they're doing both.
Yes, subscriptions are significantly cheaper than "API" prices, and it looks like they'll be closing that loophole starting with Fable, after a temporary preview.
Yes, subscriptions are significantly cheaper than "API" prices, and it looks like they'll be closing that loophole starting with Fable, after a temporary preview.
snort that's great. It's relevant too: in my experience so far, Fable (I'm having a hard time getting used to that name for some reason) is significantly better at extrapolating intent with less...
snort that's great. It's relevant too: in my experience so far, Fable (I'm having a hard time getting used to that name for some reason) is significantly better at extrapolating intent with less prompting and then making mostly reasonable calls on how to proceed without handholding.
It looks like there was a heavy focus on long running autonomous tasks during fine tuning. I have mixed feelings about this.
Okay I re-ran with Opus 4.8 using the repo state from the previous commit and the exact same prompt as I gave to Fable. Fable results: Auth bypass [critical] Input validation [minor] Rate limiting...
Okay I re-ran with Opus 4.8 using the repo state from the previous commit and the exact same prompt as I gave to Fable.
I only dabble in AI from free tiers, shame on me. I'm curious though, because it feels like they'll eventually narrow down the pricing to be slightly higher than what a human would cost fulfilling...
I only dabble in AI from free tiers, shame on me. I'm curious though, because it feels like they'll eventually narrow down the pricing to be slightly higher than what a human would cost fulfilling a similar function right? That's the cut-off point when deciding the price, that's where there profits start working. Or am I missing something?
They wouldn't offer a cheaper service because that would A) put a ton of people out of work and B) they'd miss out on profit.
Why would they care? Increasing prices does not increase profit. They’ll price it based on the inflection point of marginal revenue, which would depend on both how useful the product is and the...
They wouldn't offer a cheaper service because that would A) put a ton of people out of work
Why would they care?
B) they'd miss out on profit.
Increasing prices does not increase profit. They’ll price it based on the inflection point of marginal revenue, which would depend on both how useful the product is and the competition.
Nobody really knows what's going to happen, but there's a strong argument that humans and AI are (and will be) good at different stuff and fulfill different functions. So from that standpoint, it...
Nobody really knows what's going to happen, but there's a strong argument that humans and AI are (and will be) good at different stuff and fulfill different functions. So from that standpoint, it won't be quite as simple as "price it juuuust more than a human would cost."
But even if that's not true, think of how hard it is to say how much a human costs - even in one singular role. In what city? At what company? What about benefits?
The safety filter clearly needs a ton of refinement. I’ve been blocked several times now for very innocuous code. Some people have also been complaining that the contents of their memory have...
The safety filter clearly needs a ton of refinement. I’ve been blocked several times now for very innocuous code. Some people have also been complaining that the contents of their memory have basically locked them out of using Fable at all, because they work in chemistry. Also, there was a period in time today when their safety filter was simply offline and I couldn’t use fable at all.
Oh! That might explain why I've been completely unable to use it...
Some people have also been complaining that the contents of their memory have basically locked them out of using Fable at all, because they work in chemistry.
Oh! That might explain why I've been completely unable to use it...
Oh look, more stupid bullshit toys for billionaires. I miss the days when normal people could buy computers and tech companies shipped software that didn't suck.
Oh look, more stupid bullshit toys for billionaires. I miss the days when normal people could buy computers and tech companies shipped software that didn't suck.
I've had a chance to test Fable and I can confirm that it's as good as the model card implies. As I speculated a while back in some thread or another, it's a step change similar to Sonnet -> Opus. When you consider what Opus 4.8 is capable of, that's significant.
There are going to be a lot of hot takes, just like there are with every frontier model release, and most of them are going to be wrong, just as they've been with every model release.
One of the most popular hot takes was that project glasswing was disingenuous marketing. That being when Anthropic released Mythos to a limited selection of companies and government groups that were deemed critical to digital infrastructure. The takes continued even after people who had access started saying that this thing was legit. Not suprising, there are big feelings and huge interest in this topic, hot takes generate a lot of clicks, engagement and catharsis.
Well, having tested Fable (the name for the Mythos public release) on a large and fairly complex codebase, it wasn't empty hype. Its ability to connect the dots between disparate functionality, reason about it, and follow the chain of cause and effect to impressive depth, would almost definitely translate to unprecedented cyber capability. All the people who said Mythos was legit weren't lying.
Some things that stand out to me so far: Fable has much broader/deeper knowledge of software architecture which translates to better "taste". Still far from perfect but a significant step up. Second, it catches things that were functionally invisible to smaller models (because Fable is almost definitely larger than anything previously released, fine tuning alone couldn't accomplish this). Because it "sees" more, it can do more things that previously required human expertise.
Another popular set of hot takes in the other direction is that these models are now so good that they'll kill software engineering as a career, replace all knowledge workers, advance to AGI sooner rather than later, and so on. I don't think that's true. Fable is good but it's still a dumbass in a lot of ways. That's baked into the core of LLM technology. It's better at everything but it's still only mimicking reasoning and understanding, which leads to all sorts of mistakes. There's no actual awareness anywhere in the loop. The illusion thereof is remarkable sometimes though.
The last bit is concerning, Fable is another step up in the ability of these models to fool people who can't fluently read the code themselves into thinking the code quality is far better than it is. And it will lead to a new wave of engineers convincing themselves they don't need to examine the output anymore. Who knows exactly how that will ultimately play out. I just know it makes me uncomfortable.
Yet another set of hot takes centers around the idea that advancement has or will soon plateau, at which point the bubble will burst and it will all come crashing down. Adjacent to this is the idea that the frontier companies will never be able to make a profit on inference, which is the key area where profitability will matter in the long run. This serving as evidence of the impending fall. But the thing is that there is no credible public information about inference profit and loss at these companies at all. It's all wild speculation. Except we do know exactly how much inference costs for open models, and there is very clearly a lot of room for profit. Personally I suspect that Anthropic and Open AI are already making a profit on inference and have been for some time. Anthropic likely for most of a year and Open AI since (at the latest) GPT 5.5 when they significantly raised prices. It's even possible that the current prices are fully sustainable. Moreso when you factor in that inference costs are almost definitely going to continue to come down (relative to utility). Again that's something we can observe with open models and there's still a lot of room for innovation and optimization in a new tech like this. The open question is whether or not they'll ever be able to make enough profit to justify the astronomical investment and valuations.
As far as a plateau goes... Seems reasonable to assume that will happen at some point but so far it keeps stubbornly not happening.
In case someone gets the wrong idea, I'm not attempting to defend these companies, I very much wish that LLM technology wasn't currently led by trillion dollar companies and their tech elite executives. I wish all the success to the open models, which will be the fallback if (when) the big labs start to enshittify or the bubble bursts. The latter being something else that keeps stubbornly refusing to happen.
Fantastic comment. Great to hear from somebody using these models firsthand.
To add a bit of nuance, I think a further open question is if they’ll be able to create a moat around model development. The issue around the profitability of LLMs is less about demand (it’s clear that’s high) and more about competition.
The highly profitable software companies have all had very high user lock-in. Many of the products are free, available by default, and benefit from network effects. It’s not clear yet if any of these will apply to LLMs. Instead, the market gets a plethora of LLMs to choose from, and the concern is that the capabilites of Fable might be imitated by the open-source models in a couple years or less. If they do, companies will switch vendors.
Without a moat, competition will drive margins low enough so these AI companies never justify their valuations. Software VCs assume their game is winner-take-all, which has consistently been justified. If that assumption fails, that’s when the bubble will burst.
Yes, thanks for adding that. Currently the moat is velocity, none of the cheap/open alternatives have been able to get close enough to the frontier to tempt the majority of users. And velocity is expensive... but it can work as a moat as long as there's plenty of frontier available. If they hit a plateau everyone will probably catch up.
I've been wondering if Anthropic started their IPO process when they did in order to go public when they had a clear lead. Opus 4.8 put them pretty far ahead, Fable/Mythos just makes it undeniable. For now at least.
My exposure to AI related media is admittedly pretty sporadic and narrow, but to me it seemed like after the dust had settled, the general sentiment (Krebs/Schneier on Security, Daniel Stenberg and other high profile open source maintainers) was that Glasswing was fine, but not necessarily any better than any of the other models that were already out there and available to everyone, and that the marketing was pretty over-hyped.
Now that Mythos is public in the security hobbled form of Fable, we don't have to speculate. In my testing yesterday Fable found two legitimate vulnerabilities that previous models (and I) had missed. And that was in non security focused scans (because most security related prompts currently get downgraded to Opus 4.8). In both cases they were subtle issues that were easy to miss.
It's true that models like Opus 4.8 and GPT 5.5 can be wrangled to find a lot of security issues. In the hands of a decent engineer, with a good harness, you can use either of those models to find and patch or exploit all sorts of vulnerabilities. It's an iterative process though. According to Anthropic, the reason for the controlled release was because Mythos is better at chaining vulnerabilities into working exploits on its own. It would allow anyone, including non-engineers, to find and exploit holes in widely used software. Glasswing gave those companies a chance to patch many of the holes in advance.
I don't have access to unrestricted Mythos, just Fable, so I can't test the full extent of the capabilities Anthropic is claiming. But seeing Fable's capabilities in other areas of coding I have no doubt they're telling the truth. It's significantly better at putting pieces together into a working thesis and then following it through to a given conclusion, which would definitely generalize into security research.
That said, I don't think project glasswing was wholly altruistic. When they released Mythos they didn't have anywhere near enough available compute to handle a wide release. So while the safety angle was legitimate, it also served their purposes to do a limited release while they scrambled to find the compute for a full scale release. And yeah, the hype didn't hurt either. But the aforementioned hot takes that it was all smoke and mirrors are now demonstrably false.
I work on a code base that has been scrutinized by Mythos. It identified three issues. One was real (but extremely unlikely to ever be exploited in any meaningful way). The other two were mostly bunk and based on misleading code comments. How code is written and how it's used don't always align.
I'm sure Mythos is finding more legitimate vulnerabilities, but it's also increasing my workload because a 33% hit-rate requires a lot of human scrutiny...
I think you were really clear, for what it’s worth!
If anyone else in science is confused by how astonishingly broad the safety blocks are on Fable: while some summaries talk about safeguards being broader and against misuse, the model card and deeper within this linked page essentially explain that everything remotely related to biology or chemistry is blocked. It isn't like a typical model, where messages deemed high risk are blocked: the model is simply completely useless in those fields.
Every single query I tried with Fable has been blocked so far; even something as simple as the convex optimization routine in a chemical equilibrium solver or as abstract as a tile assembly model simulator. It triggers early enough that it seems to be triggering on even individual words like "concentration", "equilibrium", or "dimer".
While not related to my use, for safety reasons, too, of course, Anthropic also discloses that they will silently block, degrade, or secretly modify prompts for usage that appears to be trying to develop any competition to them.
This is particularly frustrating because I expect that a number of other people, particularly at large corporations, or groups connected to the US government, do have access to Mythos. They may be people submitting to the same journals and conferences. But as a scientist in academia, I doubt my group will have access any time soon, if at all.
That's interesting hands on experience thanks for sharing... As soon as I saw the 5% refusal rate in the model card I knew it was going to be a problem. 5% refusal across all queries translates to near total refusal for the subset they've targeted (cybersecurity, microbiology and chemistry).
I can confirm that the refusal rate (or actually Opus downgrade rate) for cyber is extemely high and sometimes it falsely flags things that have nothing to do with security.
My read is that they erred dramatically in favor of refusal just to get the release out the door and then will relax restrictions incrementally once they have more usage data and testing.
I'm guessing so. I was surprised initially because the error message is not well written (it's missing chemistry, and suggests that 'normal, safe' content might be falsely flagged, when the model card makes clear they're just blocking all content in the banned fields, not making any classification about safety). And it's a very different sort of refusal than I've experienced with essentially all other models, which was both unexpected and makes me think it is much simpler, making it significantly harder to trick and easier to implement at the cost of being broader. I wouldn't be enormously surprised if it's doing something like keyword searches across input+output+thinking.
I've been sorta waiting for this, and suspect some of it isn't anything to do with the new model so much as "people in the companies are being asked uncomfortable questions by authorities".
Search engines/websites basically went through the same process of "oh this is neat" to "oh cool someone explained how to make wildly dangerous thing" to "okay we need some attempt at limiting access to wildly dangerous thing"
Chemistry was the big one I was waiting for, as I've had it explained before that there are materials that the average person could make if given a simple step by step guide and some very common precursors, so they do their best to make sure such guides are not just easily available (not that they stop everyone but at least some curious/angry teenager).
If you go through X years of chemistry and get the right clearances/jobs it becomes obvious how easy some of this stuff could be to make, but that's at least somewhat "controlled" by weeding out people who don't get across that line, and the general hope such people wouldn't do such a thing.
Obviously you can't stop everything (meth is an easy example), but I've at least heard some anecdotes of "huh it probably shouldn't be allowed to give that answer, god knows a google search wouldn't".
Certain parts of physics and GPS I heard was blocked from the start, even obliquely, but I believe some of that is because it's also easier to identify problematic questions (and they're more of interest to state actors than individuals).
That said i'm a little surprised by biology and wonder how much that wanders into the other side of the coin which is less about public safety and more about copyright. We can spin theories all day (could just be you can't block chemistry without blocking biology), but I wouldn't be totally shocked if some pharma exec screwed around with it and got "yeah it just guessed proprietary drug X's formulation, get my senator on the phone". I think that's more dramatic than I intend (these things don't really need to be secret as they have legal enforcement), but I do wonder if there were conversations in that realm.
It's a big question back on the copyright block when dealing with countries that don't give a fuck. If an AI can just figure out the few things you didn't get stolen/leaked, then how do you compete?
On the other hand, I have an optimizer applet for a tycoon game that Claude coded, and I asked Fable to improve it and it gave me a bunch of improvements and no blocking. I turned off switching to another model if progress is blocked. In fact, I was switching to sonnet for some of the simpler prompts and they were still getting submitted to Fable.
Certainly not something as exploitable, and it's in the realm (programming) that Fable is nominally intended for, but it has worked fine for me.
It's a problem specifically with scientific work vaguely connected to chemistry or biology. I use Claude Code for programming scientific packages, and all these blocks were caused by prompts around coding. That's part of the frustrating thing: it appears that the model is unusable for any of my projects, even for parts that aren't connected to the science directly, because the projects themselves use the terminology of molecular interactions. A friend of mine is using it for some CS theory work right now without too much trouble.
A few of us have been having some discussions about whether, if I changed the terminology used in some of the packages, they might not be blocked. It's unlikely to be worthwhile, but it's possible it would work: the classifier seems both very separate from the model and much simpler than the model itself, unlike some other safeguards; it seems to just be tripped by the terminology itself.
Yeah, I guess the question for you is if it's smart enough to figure out from the formulas alone what you're trying to do.
The problem tends to be that the classifiers are so broad that, in some cases, what I'm trying to do would not be considered chemistry, but the mere presence of chemistry in a project is enough to block any work.
In one case, my problem is with the implementation of a numerical optimization algorithm for a mostly-convex optimization problem. That the project more broadly uses the words "equilibrium", "concentration", and "energy" seems to be enough to trigger the classifier, even when prompted to look directly at the optimization problem.
TBH I'm surprised the Internet2 crowd hasn't gotten together to start an academia equivalent.
A public LLM of some kind would be amazing, not just for science, for the whole range of applications.
Don't underestimate the size of the task though. It would need nation state level funding, and success would hinge on convincing the right experts to get on board.
The EU could maybe pull it off, it would make a lot of sense for them given their recent distaste for US tech.
I imagine various people in government and academia have at least talked about it by now.
If you just mean “of some kind,” there are open-weights LLM’s already and some of them are fairly decent for writing code.
I was thinking in terms of something homegrown, anyone could of course use an open model as a starting point, but they'd then be reliant on that provider since they wouldn't have their own training pipeline. For a government or academic solution they'd ideally start from scratch.
But yeah the chinese open weights models are pretty good and it's great that they exist for all sorts of reasons.
It’s an interesting thought, but with what funding? Even excluding the knowledge and skills necessary, the hardware requirements alone probably already halt this idea.
Edit: Not even necessarily the GPUs alone are a bottleneck, just (ideally redundant) storage for the raw training data would already be prohibitively expensive for most entities or orgs entertaining this idea…
You would be shocked at the amount of hardware most large universities already have in their research closets.
Internet2 is about cutting edge internet research
The people in the industry now were training on cutting edge gear in Uni.
Fun facts: Penn State has a nuclear reactor.
If you can dodge a wrench, you can dodge a ball, if you will.
Mea culpa, that’s a great point.
If the goal was a very large, general-purpose model, I imagine research that’d come out of potentially resource-constrained environments [in comparison to the “AI” big players] would focus on algorithmic and size optimization, regardless. It’s still cutting-edge after all, just potentially at a smaller scale and/or helping achieve smaller hardware requirements.
(As an aside, I’m a little embarrassed, but I just now realize the same is actually true for my alma mater: they have both the huge data/computing center as well as a nuclear reactor, too – although I’m not sure how much free computing capacity could be scheduled there towards an intensive task like (very) large language model training.)
If any given university had been given a grant or donation to the tune of 1/10th of any one of Anthropic's funding rounds, they'd be able to scale quickly too.
Anytime people tell me universities can't do what industry does because of funding, I remind them that if we properly taxed wealth and income, they too would have access to multiple billions at a whim.
That's somewhat annoying, for me the most intriguing parts in the announcement are the ones about biology and drug design, so it's a shame there's no way for anyone to independently test those
I'm in a similar situation. The worst failures of Opus, for me, have been from its tendency to hallucinate completely bogus scientific ideas while working on my code, and then silently put them throughout implementations. I was hopeful that the improvements in scientific understanding would be helpful.
$10/MTok in, $50/MTok out
I'm sure all these enterprises will be thrilled to see their spend double overnight. Probably more because Fable probably churns twice as fast as well.
The amount of usage that this thing burns is… Something else. And how they’re dealing with it is a head scratcher for me:
I understand the idea of them being in an IPO and needing to create hype. But, in an ambient where we’re starting to notice cold feet around AI, where people like Scam Altman are starting to backtrack on what they said, where these companies are starting to accept the reality of how their business model was unsustainable and raise prices… Doesn’t this just make them look worse?
Wouldn’t working on optimizations and deliver the same quality for lower compute be a more sensible choice? I’m not even talking about pragmatism per se - there’s nothing pragmatic about the current market situation - I’m talking about how an AI company like Anthropic would create more investor confidence if they delivered a model just as good as sonnet but runs in cheaper hardware, and so they could promise that they have a path to profitability.
But instead - even though they aren’t profitable as afaik don’t have a clear plan - they deliver an even more expensive model…?
But… Meh, what do I know.
Edit: Yeah, I don’t see anyone paying these prices
I expect that's exactly what we see with the updates to lower-tier models. For example, distillation of Mythos/Fable might well be used to make Opus or Sonnet 4.9.
It would be the opposite. Models that run 'on cheaper hardware' are exactly the ones whose price on openrouter approaches the bare electricity cost. Anthropic and OpenAI need to support their valuations with seemingly unique frontier capabilities, sold at a price that at least recovers operational costs. The implied business model is that if you come for the frontier models you'll stay for everything else; Claude Code or Claude Whizbang Automation will be happy to spend tokens on other, less-frontiery tasks at more profitable price points.
With enough reputation and reliability, people will. If you could truly double the productivity of an engineer or developer, even $10k/month in token cost would be worth it to many large companies.
The problem is that doubling developer productivity isn’t going to result in doubling of actual output or value creation in many circumstances. It’s often not the developers that are the bottleneck, even if they are perceived as the bottleneck.
Who gets the blame, in the eyes of the people who cut the checks:
A: Functional people who didn't start testing until day before a deadline.
B: Programmer and ops mad dashing to fix critical error functional person found at 4:50 day of deadline
C: Manager of A
D: Manager of B
While nobody is blameless in this scenario, I'm fairly certain that everybody will point their fingers at B, "Because B is a business expense and A brings in the money."
As long as they're supply-constrained due to datacenter capacity and there are customers willing to pay, they make more money by charging more. It's economics 101.
This is expense-account pricing, but I assume that there will be businesses willing to pay more than I will as a hobbyist. Meanwhile, I'll stick with the Chinese models I've been using.
I imagine prices will eventually come down due to increased supply, algorithmic improvements and competition.
The variable you're missing is how many customers were sold on "and this will replace X number of workers for Y spend a year".
If the AI spend is > Y, well it's back to the horrors of human labor, and I'm already aware of conversations around that.
It's worth noting that to a company an employee is more expensive than their yearly take home as there's things like benefits and health care, but I'm seeing a lot of reassessment of if any of this is going to pay for itself.
Personally, it's what I expected, as I still see this all as if you just launched Photoshop yesterday for free. Everyone would use it. Now we're at the point where they say "okay it's $100 a user a month" and suddenly it's clicking for management that's just not cheaper than using the clip art they were before.
The other thing i'm seeing a lot of is that AI isn't always a sustained need? A lot of coding frustration comes from when you're learning something new or dealing with something you haven't dealt with before (especially when rtfm leads to outdated docs.....). Once you're over that hurdle and have your patterns, if you're remotely good at documentation/abstraction/libraries....well now you've got your reusable code.
I'm also seeing cases where "oh we'll just have the AI do that" became "Yeah what did that nerd say about just writing down the code and running it ourselves".
They have to be aware of all of this and pricing accordingly, but like photoshop I think it'll go from "oh everyone uses that" to "yeah I know someone who's job/expensive hobby requires that".
Edit-
I will add another variable that's very important to all of this is WHERE it hits the expenses. Companies are often laser focused on payroll and a hell of a lot less concerned about other expenses, especially technical ones which upper management sometimes doesn't even understand. I've frequently seen very high consultant bills get rubber stamped instead of a hire that's a fraction of the cost because of what reports the expenses go on and how they're rolled up.
Obviously bottom line is bottom line, but I do think that if AI can be "close" to employee cost it'll stay hidden elsewhere at some companies because payroll down and profits up! will be the message, even if digging into the numbers you likely would've had higher profits and lower expenses had you kept payroll up, and profits went up due to other factors.
Wasn't there recently an article about AI companies starting to be profitable because of different agent pricing models for end customers (subscriptions) and enterprises (usage tokens only)? I'm an end user and don't have a Claude subscription, so I never checked.
Also I think there's a good chance that in the background they're doing both.
Yes, subscriptions are significantly cheaper than "API" prices, and it looks like they'll be closing that loophole starting with Fable, after a temporary preview.
I've had only great experiences with Fable 5 so far. It found a serious vulnerability in a FOSS website I develop, and a few less critical ones.
Did you create a fancy harness to scan for vulnerabilities?
I used this advanced prompt:
It's a small codebase.
snort that's great. It's relevant too: in my experience so far, Fable (I'm having a hard time getting used to that name for some reason) is significantly better at extrapolating intent with less prompting and then making mostly reasonable calls on how to proceed without handholding.
It looks like there was a heavy focus on long running autonomous tasks during fine tuning. I have mixed feelings about this.
:)
Is this your first time asking AI to harden your code?
Okay I re-ran with Opus 4.8 using the repo state from the previous commit and the exact same prompt as I gave to Fable.
Fable results:
Opus results:
In those words, yes. But I have asked Opus many many times to find issues in these files.
I only dabble in AI from free tiers, shame on me. I'm curious though, because it feels like they'll eventually narrow down the pricing to be slightly higher than what a human would cost fulfilling a similar function right? That's the cut-off point when deciding the price, that's where there profits start working. Or am I missing something?
They wouldn't offer a cheaper service because that would A) put a ton of people out of work and B) they'd miss out on profit.
Why would they care?
Increasing prices does not increase profit. They’ll price it based on the inflection point of marginal revenue, which would depend on both how useful the product is and the competition.
Nobody really knows what's going to happen, but there's a strong argument that humans and AI are (and will be) good at different stuff and fulfill different functions. So from that standpoint, it won't be quite as simple as "price it juuuust more than a human would cost."
But even if that's not true, think of how hard it is to say how much a human costs - even in one singular role. In what city? At what company? What about benefits?
The safety filter clearly needs a ton of refinement. I’ve been blocked several times now for very innocuous code. Some people have also been complaining that the contents of their memory have basically locked them out of using Fable at all, because they work in chemistry. Also, there was a period in time today when their safety filter was simply offline and I couldn’t use fable at all.
Oh! That might explain why I've been completely unable to use it...
Possibly try an anonymous session so it doesn’t fetch your memory?
Aaaaaaand it's already been jailbroken.
Oh look, more stupid bullshit toys for billionaires. I miss the days when normal people could buy computers and tech companies shipped software that didn't suck.