skybrian's recent activity

  1. Comment on Project Glasswing: securing critical software for the AI era in ~tech

    skybrian
    Link Parent
    The rate at which new security bugs are found by Mythos seems measurable. It seems like it would slow down a lot after they fix the ones it finds easily. After it hasn't found anything new for a...

    The rate at which new security bugs are found by Mythos seems measurable. It seems like it would slow down a lot after they fix the ones it finds easily. After it hasn't found anything new for a while, it would be safer to release it. (At least for major OSes and browsers.)

    The tricky bit: what if Mythos gets better when someone improves the harness that runs it? The LLM is an important part of of the system, but the other parts matter too.

    The worst case is something like "eternal September" where reports of new security bugs never slows down. I imagine that would only happen for a project where the software development process is somehow cursed, not for code that's reasonably well-engineered.

    1 vote
  2. Comment on What programming/technical projects have you been working on? in ~comp

    skybrian
    Link
    Still tinkering with my personal links website. I decided that it would handy to be able to import images for charts.

    Still tinkering with my personal links website. I decided that it would handy to be able to import images for charts.

    3 votes
  3. Comment on Project Glasswing: securing critical software for the AI era in ~tech

    skybrian
    Link Parent
    I expect they'll open it up more after major browsers and OSes are hardened. Also, people at these companies can contribute patches to open source projects.

    I expect they'll open it up more after major browsers and OSes are hardened. Also, people at these companies can contribute patches to open source projects.

    4 votes
  4. Comment on Project Glasswing: securing critical software for the AI era in ~tech

    skybrian
    Link Parent
    Yes, it seems more like gossip than conventional wisdom at that point. A lot of people are building their own infrastructure and coming up with their own best practices based on what seems to work...

    Yes, it seems more like gossip than conventional wisdom at that point. A lot of people are building their own infrastructure and coming up with their own best practices based on what seems to work and by trying things they've heard about.

    But I expect that the industry will learn things and they will be written down, codified as open source software and become more widely known. It seems a little odd to bet against people learning how to use tools better.

    It might take a while, like what happened with JavaScript frameworks.

    1 vote
  5. Comment on Project Glasswing: securing critical software for the AI era in ~tech

    skybrian
    Link Parent
    There’s a lot of enterprise legacy code out there that was written without AI, so that situation doesn’t seem new. It’s not easy to refactor your way out, but it’s possible, and coding agents...

    There’s a lot of enterprise legacy code out there that was written without AI, so that situation doesn’t seem new. It’s not easy to refactor your way out, but it’s possible, and coding agents might help?

    I think as long as you spend some time asking the AI to clean up the code, you’re less likely to get stuck like that.

    4 votes
  6. Comment on How China built its vast natural gas stockpile in ~society

    skybrian
    Link
    From the article: [...] [...] [...] [...] [...] [...]

    From the article:

    China is the world’s largest importer of natural gas and the largest consumer of fertilizer, much of which is made from natural gas. China also has the biggest chemicals industry, much of which requires natural gas as well.

    The country has other supply options besides the storage tanks, which hold liquefied natural gas brought in by sea. It has constructed pipelines to gas fields in Central Asia and Russia. China has developed coal-based processes that can replace natural gas in making some kinds of chemicals. And China has more than doubled its domestic production of natural gas in the past decade through fracking and other technologies.

    While the United States now leads the world in oil and natural gas production by a wide margin, China is the fourth-largest producer of natural gas, trailing only the United States, Russia and Iran. China is the fifth-largest oil producer, behind the United States, Saudi Arabia, Russia and Canada.

    [...]

    Beijing’s top leaders have long been preoccupied with their country’s vulnerability to pressure from the American or Indian Navy on their seaborne supply of oil and natural gas from the Middle East. The country’s programs to develop solar and wind power and electric cars as alternatives to oil all moved into high gear 20 years ago.

    [...]

    China’s expansion of strategic stockpiles of fossil fuels is more recent, an effort pushed by its top leader, Xi Jinping. He has uttered dark warnings about the challenges facing the globe and the need for China to depend on commodities and technologies found within its borders.

    [...]

    China’s natural gas reserves, together with imports from places that are not affected by fighting in the Mideast, like Australia, Turkmenistan and Russia, are ample for home heating and cooking. Households, including residential electricity use, represent less than 15 percent of China’s natural gas consumption. China is also finishing its second consecutive warm winter, and residential gas demand has dropped steeply with the end of the heating season last month.

    [...]

    Natural gas is particularly difficult to store. The easiest approach is to keep it underground by pumping it into salt caverns or into previously exhausted underground natural gas fields near big cities. But China has few of these caverns and fields relative to its enormous population.

    That has prompted it to pursue a technologically audacious strategy: storing enormous quantities of supercooled gas as a liquid in aboveground storage tanks. The state-owned China National Offshore Oil Corporation disclosed in December that it had built 18 of its largest size of storage tanks for liquefied natural gas — more than twice as many as the rest of the world combined.

    South Korea is constructing seven equally large L.N.G. tanks about 75 miles south of Seoul, to be completed in stages by the end of 2029. Japan has also begun building slightly smaller storage tanks.

    [...]

    The row of enormous storage tanks in Yancheng holds liquefied natural gas at a temperature of minus 260 degrees Fahrenheit, or minus 162 degrees Celsius. When the gas is allowed to warm gradually to room temperature through a system of pipes, it expands 600-fold. The tanks in Yancheng are connected to a long pier into the Yellow Sea to unload L.N.G. from ships.

    [...]

    China has one more tool to make sure it has enough natural gas: making less fertilizer for export. Analysts say that since the start of the war in Iran, China has already halted most of its overseas fertilizer sales.

    3 votes
  7. Comment on US and Iran agree to provisional ceasefire with Tehran saying it will reopen Strait of Hormuz in ~society

    skybrian
    Link
    From the article: [...] [...] [...]

    From the article:

    The US and Iran agreed to a two-week conditional ceasefire on Tuesday evening after a last-minute diplomatic intervention led by Pakistan, canceling an ultimatum from Donald Trump for Iran to surrender or face widespread destruction.

    [...]

    But by Tuesday evening, Trump announced that a ceasefire agreement had been mediated through Pakistan, whose prime minister Shehbaz Sharif had requested the two-week peace in order to “allow diplomacy to run its course”.

    [...]

    Iran’s foreign minister, Abbas Araghchi, issued a statement shortly after Trump’s announcement saying Iran had agreed to the ceasefire.

    “For a period of two weeks, safe passage through the Strait of Hormuz will be possible via coordinating with Iran’s Armed Forces,” he wrote.

    [...]

    Israel will also agree to the two-week ceasefire, Axios reported, citing an Israeli official, adding that the ceasefire would enter effect as soon as the blockade of the strait of Hormuz ceased.

    5 votes
  8. Comment on Claude Mythos preview in ~tech

    skybrian
    Link
    From the article: [...] [...] [...]

    From the article:

    As we wrote in the Project Glasswing announcement, we do not plan to make Mythos Preview generally available. But there is still a lot that defenders without access to this model can do today.

    Use generally-available frontier models to strengthen defenses now. Current frontier models, like Claude Opus 4.6 (and those of other companies), remain extremely competent at finding vulnerabilities, even if they are much less effective at creating exploits. With Opus 4.6, we found high- and critical-severity vulnerabilities almost everywhere we looked: in OSS-Fuzz, in webapps, in crypto libraries, and even in the Linux kernel. Mythos Preview finds more, higher-severity bugs, but companies and software projects that have not yet adopted language-model driven bugfinding tools could likely find many hundreds of vulnerabilities simply by running current frontier models.

    [...]

    Shorten patch cycles. The N-day exploits we walked through above were written fully autonomously, starting from just a CVE identifier and a git commit hash. The entire process from turning these public identifiers into functional exploits—which has historically taken a skilled researcher days to weeks per bug—now happens much faster, cheaper, and without intervention.

    This means that software users and administrators will need to drive down the time-to-deploy for security updates, including by tightening the patching enforcement window, enabling auto-update wherever possible, and treating dependency bumps that carry CVE fixes as urgent, rather than routine maintenance.

    [...]

    Ultimately, it’s about to become very difficult for the security community. After navigating the transition to the Internet in the early 2000s, we have spent the last twenty years in a relatively stable security equilibrium. New attacks have emerged with new and more sophisticated techniques, but fundamentally, the attacks we see today are of the same shape as the attacks of 2006.

    But language models that can automatically identify and then exploit security vulnerabilities at large scale could upend this tenuous equilibrium. The vulnerabilities that Mythos Preview finds and then exploits are the kind of findings that were previously only achievable by expert professionals.

    [...]

    We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development.

    In the long run, we expect that defense capabilities will dominate: that the world will emerge more secure, with software better hardened—in large part by code written by these models. But the transitional period will be fraught. We therefore need to begin taking action now.

    4 votes
  9. Comment on Project Glasswing: securing critical software for the AI era in ~tech

    skybrian
    Link
    From the article: ... ...

    From the article:

    Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

    As part of Project Glasswing, the launch partners listed above will use Mythos Preview as part of their defensive security work; Anthropic will share what we learn so the whole industry can benefit. We have also extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems. Anthropic is committing up to $100M in usage credits for Mythos Preview across these efforts, as well as $4M in direct donations to open-source security organizations.

    ...

    Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

    ...

    We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.

    4 votes
  10. Comment on Ramen, K-beauty, clothes face shortages as Iran war hits Asia’s supplies in ~society

    skybrian
    Link
    From the article: [...] [...] [...]

    From the article:

    At the root of the supply chain issues, along with soaring oil prices, is a shortage of key petrochemicals used in nearly every manufactured item. Companies aren’t sure how much longer they can absorb rising production costs. As the war grinds on, supply problems are expected to get worse, analysts say.

    Asia is more dependent on the Middle East for crude oil and liquefied natural gas imports than any other region. Countries including Japan, Thailand, South Korea and Sri Lanka have already moved to control oil prices to blunt the blow to consumers, or have asked employees to change work schedules to conserve fuel. So far, three South Korean airlines entered emergency management mode to deal with high jet fuel costs.

    In Malaysia, the rubber glove industry is facing higher costs due to a shortage in a petroleum-based material used to produce the personal protective equipment.

    [...]

    In Indonesia, bottled water producers are planning to cut spending to deal with the price increase in raw materials.

    And in Singapore and Taiwan, it’s becoming more expensive for companies to produce medical devices such as syringes and catheters, largely made of petroleum-related products.

    The supply chain disruptions have reached a scale that many manufacturers say is reminiscent of the coronavirus pandemic.

    Iran’s effective shutdown of the Strait of Hormuz, a choke point through which a large portion of the world’s oil passes, has rattled many Asian economies and driven up the cost of raw materials — especially naphtha, which is so critical that South Koreans call it the “rice of the petrochemical industry.”

    Derived from crude oil, it is vital in the production of chemicals such as ethylene and propylene, which are used to make everyday products including plastic bottles, food packaging, car parts, rubber and more. As much as 70 percent of Asia’s naphtha passed through the Strait of Hormuz last year.

    Compared with much of Asia, the United States is less exposed because it relies on ethane, a cheaper alternative to naphtha.

    [...]

    In South Korea, the government has rolled out 12-point energy austerity plan, which recommends taking shorter showers and charging phones and electric vehicles during the day.

    Shoppers are panic-buying government-regulated garbage bags out of fears that a naphtha shortage could make them and other plastic goods scarce. Some stores are limiting the number of bags sold to each customer to deter hoarding.

    “There is no need to worry about the supply of standard garbage bags,” South Korean Energy Minister Kim Sung-whan posted on X last week. “You will never be in a situation where you are forced to let garbage pile up at home.”

    South Korea imports 45 percent of its naphtha, with 77 percent of that share coming from the Middle East, according to industry figures. It imports about 70 percent of crude oil through the Strait of Hormuz, according to lawmakers.

    [...]

    Supply chain disruptions have hit the country’s plastics makers, including those making packaging materials for instant ramen noodles and plastic containers for “K-beauty” skin care products.

    5 votes
  11. Comment on A cryptography engineer’s perspective on quantum computing timelines in ~comp

    skybrian
    Link
    From the article: [...] [...] [...] [...]

    From the article:

    My position on the urgency of rolling out quantum-resistant cryptography has changed compared to just a few months ago. You might have heard this privately from me in the past weeks, but it’s time to signal and justify this change of mind publicly.

    There had been rumors for a while of expected and unexpected progress towards cryptographically-relevant quantum computers, but over the last week we got two public instances of it.

    First, Google published a paper revising down dramatically the estimated number of logical qubits and gates required to break 256-bit elliptic curves like NIST P-256 and secp256k1, which makes the attack doable in minutes on fast-clock architectures like superconducting qubits. They weirdly1 frame it around cryptocurrencies and mempools and salvaged goods or something, but the far more important implication are practical WebPKI MitM attacks.

    Shortly after, a different paper came out from Oratomic showing 256-bit elliptic curves can be broken in as few as 10,000 physical qubits if you have non-local connectivity, like neutral atoms seem to offer, thanks to better error correction. This attack would be slower, but even a single broken key per month can be catastrophic.

    [...]

    Overall, it looks like everything is moving: the hardware is getting better, the algorithms are getting cheaper, the requirements for error correction are getting lower.

    I’ll be honest, I don’t actually know what all the physics in those papers means. That’s not my job and not my expertise. My job includes risk assessment on behalf of the users that entrusted me with their safety. What I know is what at least some actual experts are telling us.

    [...]

    Scott Aaronson tells us that the “clearest warning that [he] can offer in public right now about the urgency of migrating to post-quantum cryptosystems” is a vague parallel with how nuclear fission research stopped happening in public between 1939 and 1940.

    The timelines presented at RWPQC 2026, just a few weeks ago, were much tighter than a couple years ago, and are already partially obsolete. The joke used to be that quantum computers have been 10 years out for 30 years now. Well, not true anymore, the timelines have started progressing.

    [...]

    The job is not to be skeptical of things we’re not experts in, the job is to mitigate credible threats, and there are credible experts that are telling us about an imminent threat.

    In summary, it might be that in 10 years the predictions will turn out to be wrong, but at this point they might also be right soon, and that risk is now unacceptable.

    [...]

    Concretely, what does this mean? It means we need to ship.

    Regrettably, we’ve got to roll out what we have. That means large ML-DSA signatures shoved in places designed for small ECDSA signatures, like X.509, with the exception of Merkle Tree Certificates for the WebPKI, which is thankfully far enough along.

    10 votes
  12. Comment on Sam Altman may control our future—can he be trusted? in ~tech

  13. Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech

    skybrian
    Link Parent
    We're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric...

    We're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric cars have pretty good range nowadays.

    Meanwhile, there are no law of physics or mathematical proofs preventing some researcher from coming up with a new AI algorithm that's either much more capable or equally capable but much cheaper to run. For all way know there's some undiscovered algorithm like the Fast Fourier Transform that will dramatically improve things. More likely, it will be a slog with steady improvements coming over the years.

    8 votes
  14. Comment on Designing an agent reading test in ~comp

    skybrian
    Link
    From the article: [...] [...] [...] [...]

    From the article:

    Agents are unreliable self-reporters. Not because they're dishonest, but because LLMs are agreeable pattern-matchers. Show them a rubric and ask if they met it, and they'll find a way to say yes. Self-assessment requires the model to contradict its own prior output, which goes against the grain of how these systems work.

    [...]

    The observer changes the observation. Every time we found a way to get more accurate reporting, we discovered a new layer where accuracy was lost. Score inflation led to qualitative reporting led to reframing as diagnostic led to task-based design led to the discovery of summarization layers led to human-side comparison. Each fix revealed the next problem. At some point you have to accept that agent self-reporting has a fundamental accuracy ceiling and design around it rather than through it.

    Evaluation framing changes agent effort, not just agent reporting. The Hawthorne effect is different from score inflation or unreliable narration. Those are reporting problems; the agent claims something it didn't do. The Hawthorne effect is a behavioral problem: the agent actually does something it wouldn't normally do (retry harder, try fallback approaches, scan more carefully) because it recognizes the evaluation context. You can't fix this by improving the reporting mechanism. You fix it by making the evaluation context less visible.

    Usability testing principles apply directly. The framing "we're testing the documentation, not you" is borrowed from human usability testing, where it reduces performance anxiety and produces more natural behavior. For agents, it serves a similar purpose: an agent that thinks the documentation is being evaluated has less reason to try heroic workarounds than one that thinks its own capabilities are being measured.

    [...]

    Separation of concerns is the most reliable pattern. The final design works because each participant does what they're good at. The agent does what agents do: read web pages and answer questions. The scoring form does what static analysis does: compare strings against a known set. The human does what humans do: judge whether "pool_size = 10" in the agent's summary matches "pool_size = 10" in the reference. No single participant is asked to do something it's structurally bad at.

    [...]

    The Agent Reading Test has 20 points across 10 tasks: 16 from canary tokens and 4 from qualitative assessment of task responses. A perfect score is unlikely for any current agent. The tasks are calibrated so that each failure mode affects at least some agents in practice.

    [...]

    The most interesting results come not from the score itself but from the pattern of what's missing. An agent that scores 12 with all truncation canaries but no tabbed content tells a different story than one that scores 12 with tabs but no content past 40K. The scoring form's implications text tries to make these patterns readable for anyone running the benchmark, not just people who designed it.

    4 votes