skybrian's recent activity
-
Comment on How China built its vast natural gas stockpile in ~society
-
How China built its vast natural gas stockpile
1 vote -
Comment on US and Iran agree to provisional ceasefire with Tehran saying it will reopen Strait of Hormuz in ~society
skybrian LinkFrom the article: [...] [...] [...]From the article:
The US and Iran agreed to a two-week conditional ceasefire on Tuesday evening after a last-minute diplomatic intervention led by Pakistan, canceling an ultimatum from Donald Trump for Iran to surrender or face widespread destruction.
[...]
But by Tuesday evening, Trump announced that a ceasefire agreement had been mediated through Pakistan, whose prime minister Shehbaz Sharif had requested the two-week peace in order to “allow diplomacy to run its course”.
[...]
Iran’s foreign minister, Abbas Araghchi, issued a statement shortly after Trump’s announcement saying Iran had agreed to the ceasefire.
“For a period of two weeks, safe passage through the Strait of Hormuz will be possible via coordinating with Iran’s Armed Forces,” he wrote.
[...]
Israel will also agree to the two-week ceasefire, Axios reported, citing an Israeli official, adding that the ceasefire would enter effect as soon as the blockade of the strait of Hormuz ceased.
-
US and Iran agree to provisional ceasefire with Tehran saying it will reopen Strait of Hormuz
19 votes -
Comment on Claude Mythos preview in ~tech
skybrian LinkFrom the article: [...] [...] [...]From the article:
As we wrote in the Project Glasswing announcement, we do not plan to make Mythos Preview generally available. But there is still a lot that defenders without access to this model can do today.
Use generally-available frontier models to strengthen defenses now. Current frontier models, like Claude Opus 4.6 (and those of other companies), remain extremely competent at finding vulnerabilities, even if they are much less effective at creating exploits. With Opus 4.6, we found high- and critical-severity vulnerabilities almost everywhere we looked: in OSS-Fuzz, in webapps, in crypto libraries, and even in the Linux kernel. Mythos Preview finds more, higher-severity bugs, but companies and software projects that have not yet adopted language-model driven bugfinding tools could likely find many hundreds of vulnerabilities simply by running current frontier models.
[...]
Shorten patch cycles. The N-day exploits we walked through above were written fully autonomously, starting from just a CVE identifier and a git commit hash. The entire process from turning these public identifiers into functional exploits—which has historically taken a skilled researcher days to weeks per bug—now happens much faster, cheaper, and without intervention.
This means that software users and administrators will need to drive down the time-to-deploy for security updates, including by tightening the patching enforcement window, enabling auto-update wherever possible, and treating dependency bumps that carry CVE fixes as urgent, rather than routine maintenance.
[...]
Ultimately, it’s about to become very difficult for the security community. After navigating the transition to the Internet in the early 2000s, we have spent the last twenty years in a relatively stable security equilibrium. New attacks have emerged with new and more sophisticated techniques, but fundamentally, the attacks we see today are of the same shape as the attacks of 2006.
But language models that can automatically identify and then exploit security vulnerabilities at large scale could upend this tenuous equilibrium. The vulnerabilities that Mythos Preview finds and then exploits are the kind of findings that were previously only achievable by expert professionals.
[...]
We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development.
In the long run, we expect that defense capabilities will dominate: that the world will emerge more secure, with software better hardened—in large part by code written by these models. But the transitional period will be fraught. We therefore need to begin taking action now.
-
Claude Mythos preview
17 votes -
Comment on Project Glasswing: securing critical software for the AI era in ~tech
skybrian LinkFrom the article: ... ...From the article:
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.
As part of Project Glasswing, the launch partners listed above will use Mythos Preview as part of their defensive security work; Anthropic will share what we learn so the whole industry can benefit. We have also extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems. Anthropic is committing up to $100M in usage credits for Mythos Preview across these efforts, as well as $4M in direct donations to open-source security organizations.
...
Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.
...
We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.
-
Project Glasswing: securing critical software for the AI era
10 votes -
Comment on Ramen, K-beauty, clothes face shortages as Iran war hits Asia’s supplies in ~society
skybrian LinkFrom the article: [...] [...] [...]From the article:
At the root of the supply chain issues, along with soaring oil prices, is a shortage of key petrochemicals used in nearly every manufactured item. Companies aren’t sure how much longer they can absorb rising production costs. As the war grinds on, supply problems are expected to get worse, analysts say.
Asia is more dependent on the Middle East for crude oil and liquefied natural gas imports than any other region. Countries including Japan, Thailand, South Korea and Sri Lanka have already moved to control oil prices to blunt the blow to consumers, or have asked employees to change work schedules to conserve fuel. So far, three South Korean airlines entered emergency management mode to deal with high jet fuel costs.
In Malaysia, the rubber glove industry is facing higher costs due to a shortage in a petroleum-based material used to produce the personal protective equipment.
[...]
In Indonesia, bottled water producers are planning to cut spending to deal with the price increase in raw materials.
And in Singapore and Taiwan, it’s becoming more expensive for companies to produce medical devices such as syringes and catheters, largely made of petroleum-related products.
The supply chain disruptions have reached a scale that many manufacturers say is reminiscent of the coronavirus pandemic.
Iran’s effective shutdown of the Strait of Hormuz, a choke point through which a large portion of the world’s oil passes, has rattled many Asian economies and driven up the cost of raw materials — especially naphtha, which is so critical that South Koreans call it the “rice of the petrochemical industry.”
Derived from crude oil, it is vital in the production of chemicals such as ethylene and propylene, which are used to make everyday products including plastic bottles, food packaging, car parts, rubber and more. As much as 70 percent of Asia’s naphtha passed through the Strait of Hormuz last year.
Compared with much of Asia, the United States is less exposed because it relies on ethane, a cheaper alternative to naphtha.
[...]
In South Korea, the government has rolled out 12-point energy austerity plan, which recommends taking shorter showers and charging phones and electric vehicles during the day.
Shoppers are panic-buying government-regulated garbage bags out of fears that a naphtha shortage could make them and other plastic goods scarce. Some stores are limiting the number of bags sold to each customer to deter hoarding.
“There is no need to worry about the supply of standard garbage bags,” South Korean Energy Minister Kim Sung-whan posted on X last week. “You will never be in a situation where you are forced to let garbage pile up at home.”
South Korea imports 45 percent of its naphtha, with 77 percent of that share coming from the Middle East, according to industry figures. It imports about 70 percent of crude oil through the Strait of Hormuz, according to lawmakers.
[...]
Supply chain disruptions have hit the country’s plastics makers, including those making packaging materials for instant ramen noodles and plastic containers for “K-beauty” skin care products.
-
Ramen, K-beauty, clothes face shortages as Iran war hits Asia’s supplies
11 votes -
Comment on A cryptography engineer’s perspective on quantum computing timelines in ~comp
skybrian LinkFrom the article: [...] [...] [...] [...]From the article:
My position on the urgency of rolling out quantum-resistant cryptography has changed compared to just a few months ago. You might have heard this privately from me in the past weeks, but it’s time to signal and justify this change of mind publicly.
There had been rumors for a while of expected and unexpected progress towards cryptographically-relevant quantum computers, but over the last week we got two public instances of it.
First, Google published a paper revising down dramatically the estimated number of logical qubits and gates required to break 256-bit elliptic curves like NIST P-256 and secp256k1, which makes the attack doable in minutes on fast-clock architectures like superconducting qubits. They weirdly1 frame it around cryptocurrencies and mempools and salvaged goods or something, but the far more important implication are practical WebPKI MitM attacks.
Shortly after, a different paper came out from Oratomic showing 256-bit elliptic curves can be broken in as few as 10,000 physical qubits if you have non-local connectivity, like neutral atoms seem to offer, thanks to better error correction. This attack would be slower, but even a single broken key per month can be catastrophic.
[...]
Overall, it looks like everything is moving: the hardware is getting better, the algorithms are getting cheaper, the requirements for error correction are getting lower.
I’ll be honest, I don’t actually know what all the physics in those papers means. That’s not my job and not my expertise. My job includes risk assessment on behalf of the users that entrusted me with their safety. What I know is what at least some actual experts are telling us.
[...]
Scott Aaronson tells us that the “clearest warning that [he] can offer in public right now about the urgency of migrating to post-quantum cryptosystems” is a vague parallel with how nuclear fission research stopped happening in public between 1939 and 1940.
The timelines presented at RWPQC 2026, just a few weeks ago, were much tighter than a couple years ago, and are already partially obsolete. The joke used to be that quantum computers have been 10 years out for 30 years now. Well, not true anymore, the timelines have started progressing.
[...]
The job is not to be skeptical of things we’re not experts in, the job is to mitigate credible threats, and there are credible experts that are telling us about an imminent threat.
In summary, it might be that in 10 years the predictions will turn out to be wrong, but at this point they might also be right soon, and that risk is now unacceptable.
[...]
Concretely, what does this mean? It means we need to ship.
Regrettably, we’ve got to roll out what we have. That means large ML-DSA signatures shoved in places designed for small ECDSA signatures, like X.509, with the exception of Merkle Tree Certificates for the WebPKI, which is thankfully far enough along.
-
A cryptography engineer’s perspective on quantum computing timelines
16 votes -
Comment on Sam Altman may control our future—can he be trusted? in ~tech
skybrian LinkOne of the authors is answering questions on Hacker News: https://news.ycombinator.com/item?id=47659135One of the authors is answering questions on Hacker News:
-
Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech
skybrian Link ParentWe're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric...We're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric cars have pretty good range nowadays.
Meanwhile, there are no law of physics or mathematical proofs preventing some researcher from coming up with a new AI algorithm that's either much more capable or equally capable but much cheaper to run. For all way know there's some undiscovered algorithm like the Fast Fourier Transform that will dramatically improve things. More likely, it will be a slog with steady improvements coming over the years.
-
Comment on Designing an agent reading test in ~comp
skybrian LinkFrom the article: [...] [...] [...] [...]From the article:
Agents are unreliable self-reporters. Not because they're dishonest, but because LLMs are agreeable pattern-matchers. Show them a rubric and ask if they met it, and they'll find a way to say yes. Self-assessment requires the model to contradict its own prior output, which goes against the grain of how these systems work.
[...]
The observer changes the observation. Every time we found a way to get more accurate reporting, we discovered a new layer where accuracy was lost. Score inflation led to qualitative reporting led to reframing as diagnostic led to task-based design led to the discovery of summarization layers led to human-side comparison. Each fix revealed the next problem. At some point you have to accept that agent self-reporting has a fundamental accuracy ceiling and design around it rather than through it.
Evaluation framing changes agent effort, not just agent reporting. The Hawthorne effect is different from score inflation or unreliable narration. Those are reporting problems; the agent claims something it didn't do. The Hawthorne effect is a behavioral problem: the agent actually does something it wouldn't normally do (retry harder, try fallback approaches, scan more carefully) because it recognizes the evaluation context. You can't fix this by improving the reporting mechanism. You fix it by making the evaluation context less visible.
Usability testing principles apply directly. The framing "we're testing the documentation, not you" is borrowed from human usability testing, where it reduces performance anxiety and produces more natural behavior. For agents, it serves a similar purpose: an agent that thinks the documentation is being evaluated has less reason to try heroic workarounds than one that thinks its own capabilities are being measured.
[...]
Separation of concerns is the most reliable pattern. The final design works because each participant does what they're good at. The agent does what agents do: read web pages and answer questions. The scoring form does what static analysis does: compare strings against a known set. The human does what humans do: judge whether "pool_size = 10" in the agent's summary matches "pool_size = 10" in the reference. No single participant is asked to do something it's structurally bad at.
[...]
The Agent Reading Test has 20 points across 10 tasks: 16 from canary tokens and 4 from qualitative assessment of task responses. A perfect score is unlikely for any current agent. The tasks are calibrated so that each failure mode affects at least some agents in practice.
[...]
The most interesting results come not from the score itself but from the pattern of what's missing. An agent that scores 12 with all truncation canaries but no tabbed content tells a different story than one that scores 12 with tabs but no content past 40K. The scoring form's implications text tries to make these patterns readable for anyone running the benchmark, not just people who designed it.
-
Designing an agent reading test
10 votes -
Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech
skybrian Link ParentWe don't have a Moore's law for LLM's like we do for chips, but my guess is that something similar will happen: it's going to get more efficient and usage will grow at the same time.We don't have a Moore's law for LLM's like we do for chips, but my guess is that something similar will happen: it's going to get more efficient and usage will grow at the same time.
-
Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech
skybrian LinkMoving a bit slower doesn't seem like a bad thing since it will encourage efficiency improvements. Just since the start of the year there have been some interesting research papers about improving...Moving a bit slower doesn't seem like a bad thing since it will encourage efficiency improvements. Just since the start of the year there have been some interesting research papers about improving inference efficiency.
-
Comment on Can a country get too rich? Norway shows the potential pitfalls of uncommon prosperity. in ~finance
skybrian LinkThe article talks about inefficiency in terms of wasted money. The question is what are the real-world impacts of that? It might be increased environmental impact or wasting people's time on...The article talks about inefficiency in terms of wasted money. The question is what are the real-world impacts of that? It might be increased environmental impact or wasting people's time on "bullshit jobs" compared to a more efficient system.
A more in-depth investigation would answer questions like that.
-
Comment on AI Coding agents are the opposite of what I want in ~comp
skybrian Link ParentYes. The coding agent I use is more like a command-line interface, so that’s “traditional” in an old-school sense. It’s not “as you type.” One way to think of it is that I can imagine a command...Yes. The coding agent I use is more like a command-line interface, so that’s “traditional” in an old-school sense. It’s not “as you type.” One way to think of it is that I can imagine a command that does what I want, and then I can ask the AI to do that (in English). It will use the appropriate tools.
But I can turn the tables by asking it if it has any questions or suggestions for improvements. I commonly ask it to review design docs.
From the article:
[...]
[...]
[...]
[...]
[...]
[...]