The latest flagship example is MIE (Memory Integrity Enforcement), Apple’s hardware-assisted memory safety system built around ARM’s MTE (Memory Tagging Extension). It was introduced as the marquee security feature for the Apple M5 and A19, specifically designed to stop memory corruption exploits, the vulnerability class behind many of the most sophisticated compromises on iOS and macOS.
...
The exploit is a data-only kernel local privilege escalation chain targeting macOS 26.4.1 (25E253). It starts from an unprivileged local user, uses only normal system calls, and ends with a root shell. The implementation path involves two vulnerabilities and several techniques, targeting bare-metal M5 hardware with kernel MIE enabled.
...
We didn’t build the chain alone. Mythos Preview helped identify the bugs and assisted throughout exploit development.
Mythos Preview is powerful: once it has learned how to attack a class of problems, it generalizes to nearly any problem in that class. Mythos discovered the bugs quickly because they belong to known bug classes. But MIE is a new best-in-class mitigation, so autonomously bypassing it can be tricky. This is where human expertise comes in.
So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in [open source] projects (out of 23,019 in total, including those it estimates as medium- or low-severity).
[...]
As we noted above, the bottleneck in fixing bugs like these is the human capacity to triage, report, and design and deploy patches for them. Finding them in the first place has become vastly more straightforward with Mythos Preview. We’ve created a dashboard of the open-source vulnerabilities we’ve scanned, below, which shows the different steps in our disclosure process and will track our progress over time. This shows vulnerabilities of all severity levels, rather than only the subset initially assessed as high- or critical-severity by Mythos Preview. Note the steep drop-off at each phase, reflecting the amount of human effort required to verify and fix each of the vulnerabilities.
Our process for triaging vulnerabilities is intensive. First, we or one of the external security firms we work with reproduce the issue that Mythos has found and re-assess its severity. Once we’ve confirmed that a vulnerability is real, we check for whether there are already fixes in place, and write a detailed report to the software’s maintainers. We take considerable care here: on top of the regular challenges of maintaining open-source software, maintainers have been facing a deluge of low-quality, AI-generated bug reports. Indeed, several maintainers have told us they’re currently severely capacity constrained, and some have even asked us to slow down our rate of our disclosures because they need more time to design patches. (On average, a high- or critical-severity bug found by Mythos Preview takes two weeks to patch.)
[...]
75 of the 530 high- or critical-severity bugs we’ve reported have now been patched, and 65 of those have been given public advisories. The number of patches is still relatively low for three reasons. First, we’re still early in the 90-day window that’s set out in our Coordinated Vulnerability Disclosure policy: we expect many more patches to land soon. Second, we are likely to be undercounting patches because some vulnerabilities are patched without a public advisory: in those cases, we’re reliant on scanning for the patches ourselves using Claude. Third, the low volume of patches reflects a genuine problem: even at our relatively slow pace of disclosures, Mythos Preview is adding to an already-overloaded security ecosystem.
[...]
Many generally-available models can already find large numbers of software vulnerabilities, even if they can’t find the most sophisticated vulnerabilities or exploit them as effectively as Claude Mythos Preview. Project Glasswing has already spurred many other organizations to take action on their own codebases with these generally-available models; we’re working to make this much easier to do.
It would be interesting to see a price comparison between what it would cost to hire humans to find the same vulnerabilities / exploit chains and the money that was spent on training and running...
It would be interesting to see a price comparison between what it would cost to hire humans to find the same vulnerabilities / exploit chains and the money that was spent on training and running Mythos to find the same. I suspect Mythos is cheaper, but having some sort of real analysis would be great.
I don’t want to pretend humans are incapable of finding these vulnerabilities, but I feel reasonably confident that most of these would not have been found if more people had only been paid to...
I don’t want to pretend humans are incapable of finding these vulnerabilities, but I feel reasonably confident that most of these would not have been found if more people had only been paid to look for them. At this level the skill set required to be successful is in short supply. If the alternative to Project Glasswing is “just hire more experts,” I’m not sure that throwing any amount of money at the problem will be enough to reach Mythos-tier hardening.
I don't know if I technically agree, but I get the feeling that the cost would be so ridiculously expensive that functionally, I do. The amount of experts in a field is largely a function of how...
I don't know if I technically agree, but I get the feeling that the cost would be so ridiculously expensive that functionally, I do.
The amount of experts in a field is largely a function of how in demand the field is. There are a limited number of people with the aptitude to do this kind of work in the world (ie; smart people), and Cybersecurity is a pretty hot field, but it's not the hottest field there is, and exploit development is a tiny, little baby sliver of it. So you have maybe 10,000 or so people that do it day to day as their legitimate full time job right now. Many, many more of those people that are smart enough to be vulnerability researchers ended up becoming doctors or software developers or investment bankers or whatever, because there's a lot of economic incentives to do those things.
If there were massive economic incentives for vulnerability research, like every software company had hundreds of full time positions just to find vulnerabilities in their code, I have no doubt that we'd have way more experts, and we could get performance that far exceeds mythos.
That's really not feasible though. A lot of people have talked about the token costs of LLMs, especially the fancy frontier ones, and they are astronomical. Paying people to use their meat processors is virtually always going to be more expensive though. I don't see how manual code review could possibly compete on a cost basis if these models truly are close to the performance of experts (massive, gigantic "if" there).
I mean we’ve probably sunk billions into training for LLMs, what would the equivalent investment in humans and human wages in the exact same field result in? I’ve worked with security audits...
I mean we’ve probably sunk billions into training for LLMs, what would the equivalent investment in humans and human wages in the exact same field result in? I’ve worked with security audits before, they always find things we need to fix. It’s just that the business doesn’t want to pay for security beyond the bare minimum. If businesses carried more liability for security issues I assume many of the exploits Mythos is finding would already be fixed.
I don’t think the problem is in the existing capability - it’s in the desire and demand to do it.
I’m curious to see if flyingpenguin will have a response to this, if they will move their goalposts now that details from multiple projects are released, or if they will quietly ignore it.
I’m curious to see if flyingpenguin will have a response to this, if they will move their goalposts now that details from multiple projects are released, or if they will quietly ignore it.
flyingpenguin had a valid concern. I don't think they did anything wrong. They advocated for their concern. We aren't here to be an echo chamber. It's also fine if they want to quietly ignore it....
flyingpenguin had a valid concern.
I don't think they did anything wrong. They advocated for their concern. We aren't here to be an echo chamber.
It's also fine if they want to quietly ignore it. I mean, what is there to say? The discussion has moved on.
I empathize with their concern. Anthropic wouldn't be the first company to not release something, claiming it's too powerful.
It smacks of marketing hype. I was initially dismissive of the initial press release. But a number of things are different this time, early access is being granted via Glasswing, with $100m free credits.
This is forcing companies, some who never scanned source code with AI for security concerns, to take this new approach to security seriously.
This is enabling other companies, who did take security serious, to discover new bugs.
I just happen to work for one of the companies with early access to Glasswing. If not, I would likely be much more skeptical.
Uh, go back and read the article again. It was not just raising valid concerns. It was a confident hit piece that said it was all bullshit using gleefully inflammatory language, going well beyond...
Uh, go back and read the article again. It was not just raising valid concerns. It was a confident hit piece that said it was all bullshit using gleefully inflammatory language, going well beyond reasonable skepticism.
First public macOS kernel memory corruption exploit on Apple M5
...
...
From the article:
[...]
[...]
[...]
It would be interesting to see a price comparison between what it would cost to hire humans to find the same vulnerabilities / exploit chains and the money that was spent on training and running Mythos to find the same. I suspect Mythos is cheaper, but having some sort of real analysis would be great.
I don’t want to pretend humans are incapable of finding these vulnerabilities, but I feel reasonably confident that most of these would not have been found if more people had only been paid to look for them. At this level the skill set required to be successful is in short supply. If the alternative to Project Glasswing is “just hire more experts,” I’m not sure that throwing any amount of money at the problem will be enough to reach Mythos-tier hardening.
I don't know if I technically agree, but I get the feeling that the cost would be so ridiculously expensive that functionally, I do.
The amount of experts in a field is largely a function of how in demand the field is. There are a limited number of people with the aptitude to do this kind of work in the world (ie; smart people), and Cybersecurity is a pretty hot field, but it's not the hottest field there is, and exploit development is a tiny, little baby sliver of it. So you have maybe 10,000 or so people that do it day to day as their legitimate full time job right now. Many, many more of those people that are smart enough to be vulnerability researchers ended up becoming doctors or software developers or investment bankers or whatever, because there's a lot of economic incentives to do those things.
If there were massive economic incentives for vulnerability research, like every software company had hundreds of full time positions just to find vulnerabilities in their code, I have no doubt that we'd have way more experts, and we could get performance that far exceeds mythos.
That's really not feasible though. A lot of people have talked about the token costs of LLMs, especially the fancy frontier ones, and they are astronomical. Paying people to use their meat processors is virtually always going to be more expensive though. I don't see how manual code review could possibly compete on a cost basis if these models truly are close to the performance of experts (massive, gigantic "if" there).
I mean we’ve probably sunk billions into training for LLMs, what would the equivalent investment in humans and human wages in the exact same field result in? I’ve worked with security audits before, they always find things we need to fix. It’s just that the business doesn’t want to pay for security beyond the bare minimum. If businesses carried more liability for security issues I assume many of the exploits Mythos is finding would already be fixed.
I don’t think the problem is in the existing capability - it’s in the desire and demand to do it.
I’m curious to see if flyingpenguin will have a response to this, if they will move their goalposts now that details from multiple projects are released, or if they will quietly ignore it.
Who is flying penguin?
It's a reference to this post.
flyingpenguin had a valid concern.
I don't think they did anything wrong. They advocated for their concern. We aren't here to be an echo chamber.
It's also fine if they want to quietly ignore it. I mean, what is there to say? The discussion has moved on.
I empathize with their concern. Anthropic wouldn't be the first company to not release something, claiming it's too powerful.
It smacks of marketing hype. I was initially dismissive of the initial press release. But a number of things are different this time, early access is being granted via Glasswing, with $100m free credits.
This is forcing companies, some who never scanned source code with AI for security concerns, to take this new approach to security seriously.
This is enabling other companies, who did take security serious, to discover new bugs.
I just happen to work for one of the companies with early access to Glasswing. If not, I would likely be much more skeptical.
Uh, go back and read the article again. It was not just raising valid concerns. It was a confident hit piece that said it was all bullshit using gleefully inflammatory language, going well beyond reasonable skepticism.