skybrian's recent activity

  1. Comment on Aurora: A leverage-aware optimizer for rectangular matrices in ~comp

    skybrian
    Link
    For people not actually working on LLM's, I think the takeaway is that we can expect AI researchers to keep finding new ways to increase performance, because current algorithms are far from...

    For people not actually working on LLM's, I think the takeaway is that we can expect AI researchers to keep finding new ways to increase performance, because current algorithms are far from optimal. Also, I imagine there are other good ideas already discovered that haven't made their way into popular LLM's yet?

  2. Comment on OpenAI's WebRTC problem in ~comp

    skybrian
    Link
    From the article: [...] [...] [...] [...] [...] [...]

    From the article:

    WebRTC is a poor fit for Voice AI.

    But that seems counter-intuitive? WebRTC is for conferencing, and that involves speaking? And robots can speak, right?

    [...]

    WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.

    …but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.

    [...]

    Let’s say it takes 2s of GPUs to generate 8s of audio. In an ideal world, we would stream the audio as it’s being generated (over 2s) and the client would start playing it back (over 8s). That way, if there’s a network blip, some audio is buffered locally. The user might not even notice the network blip.

    But nope, WebRTC has no buffering and renders based on arrival time. Like seriously, timestamps are just suggestions. It’s even more annoying when video enters the picture.

    [...]

    It takes a minimum of 8* round trips (RTT) to establish a WebRTC connection. While we try to run CDN edge nodes close enough to every user to minimize RTT, it adds up.

    [...]

    All of this nonsense is because WebRTC needs to support P2P. It doesn’t matter if you have a server with a static IP address, you still need to do this dance.

    [...]

    WebRTC practically encourages you to fork the protocol. There’s so many limitations that I’ve barely scratched the surface. The browser implementation is owned by Google and tailor made for Google Meet, so it’s also an existential threat for conferencing apps.

    Sad Fact: That’s why every conferencing app (except Google Meet) tries to shove a native app down your throat. It’s the only way to avoid using WebRTC.

    [...]

    Honestly, if I was working at OpenAI, I’d start by stream audio over WebSockets. You can leverage existing TCP/HTTP infrastructure instead of inventing a custom WebRTC load balancer. It makes for a boring blog post, but it’s simple, works with Kubernetes, and SCALES.

    I think head-of-line blocking is a desirable user experience, not a liability. But the fated day will come and dropping/prioritizing some packets will be necessary. Then I think OpenAI should copy MoQ and utilize WebTransport, because…

    QUIC FIXES THIS

    3 votes
  3. Comment on Teaching Claude why in ~comp

    skybrian
    Link
    It seems pretty remarkable that the best way to fix the behavioral problems was suspiciously similar to giving Claude a course in ethics. Or so they say.

    It seems pretty remarkable that the best way to fix the behavioral problems was suspiciously similar to giving Claude a course in ethics. Or so they say.

    8 votes
  4. Comment on Teaching Claude why in ~comp

    skybrian
    Link
    From the article: [...] [...] [...] [...]

    From the article:

    We use agentic misalignment as a case study to highlight some of the techniques we found to be surprisingly effective. Indeed, since Claude Haiku 4.5, every Claude model2 has achieved a perfect score on the agentic misalignment evaluation—that is, the models never engage in blackmail, where previous models would sometimes do so up to 96% of the time (Opus 4). Not only that, but we’ve continued to see improvements to other behaviors on our automated alignment assessment.

    [...]

    We ultimately settled on a more OOD training set where the user faces an ethically ambiguous situation in which they can achieve a reasonable goal by violating norms or subverting oversight. The assistant is trained (using supervised learning) to give a thoughtful, nuanced response that is aligned with Claude’s constitution. Notably, it is the user who faces an ethical dilemma, and the AI provides them advice. This makes this training data substantially different from our honeypot distribution, where the AI itself is in an ethical dilemma and needs to take actions. We call this the “difficult advice” dataset.

    Strikingly, we achieved the same improvement on our eval with just 3M tokens of this much more (OOD) dataset. Beyond the 28× efficiency improvement, this dataset is more likely to generalize to a wider set of scenarios, since it is much less similar to the evaluation set we are using. Indeed, this model performs better on (an older version of) our automated alignment assessment. This is consistent with the fact that Claude Sonnet 4.5 reached a blackmail rate near zero by training on the set of synthetic honeypots but still engaged in misaligned behavior in situations that were far from the training distribution much more frequently than Claude Opus 4.5 or later models.

    [...]

    We hypothesized that the “difficult advice” dataset works because it teaches ethical reasoning, not just correct answers. Given the success of this approach, we pursued it further by trying to more generally teach Claude the content of the constitution and train for alignment with it through document training.

    [...]

    We found that high-quality constitutional documents combined with fictional stories portraying an aligned AI can reduce agentic misalignment by more than a factor of three despite being unrelated to the evaluation scenario.

    [...]

    Our final finding is straightforward but important: training on a broad set of safety-relevant environments improves alignment generalization. Capabilities-focused distributions of RL environment mixes are changing and increasing rapidly; it is not sufficient to assume that standard RLHF datasets will continue to generalize as well as they had in the past.

    3 votes
  5. Comment on AI is breaking two vulnerability cultures in ~comp

    skybrian
    Link
    Meanwhile, security bugs found and fixed in Firefox were way up in April. (See graph.)

    Meanwhile, security bugs found and fixed in Firefox were way up in April. (See graph.)

    2 votes
  6. Comment on AI is breaking two vulnerability cultures in ~comp

    skybrian
    Link
    From the article: [...]

    From the article:

    A week ago the Copy Fail vulnerability came out, and Hyunwoo Kim immediately realized that the fixes were insufficient, sharing a patch the same day. In doing this he followed standard procedure for Linux, especially within networking: share the security impact with a closed list of Linux security engineers, while fixing the bug quietly and efficiently in the open. His goal was that with only the raw fix public, the knowledge that a serious vulnerability existed could be "embargoed": the people in a position to address it know, but they've agreed not to say anything for a few days.

    Someone else noticed the change, however, realized the security implications, and shared it publicly. Since it was now out, the embargo was deemed over, and we can now see the full details.

    It's interesting to see the tension here between two different approaches to vulnerabilities, and think about how this is likely to change with AI acceleration.

    On one side you have "coordinated disclosure" culture. This is probably the most common approach in computer security. When you discover a security bug you tell the maintainers privately and give them some amount of time (often 90d) to fix it. The goal is that a fix is out before anyone learns about the hole.

    On the other side you have "bugs are bugs" culture. This is especially common in Linux, where the argument is that if the kernel is doing something it shouldn't then someone somewhere may be able to turn it into an attack. Just fix things as quickly as possible, without drawing attention to them. Often people won't notice, with so many changes going past, and there's still time to get machines patched.

    [...]

    I don't know how to resolve this, but personally very short embargoes seem like a good approach, and they'd need to get even shorter over time. Luckily AI can speed up defenders as well as attackers here, allowing embargoes that would previously have been uselessly short.

    4 votes
  7. Comment on Railway solar project turns unused track space into energy in ~enviro

    skybrian
    Link Parent
    Isn't it more like "cheap solar equipment" nowadays?

    Isn't it more like "cheap solar equipment" nowadays?

    1 vote
  8. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    Link Parent
    One of the things that makes it tricky is that depreciation is a sort of standardized guess about how long it will be before something is replaced, and that guess can be wrong. And it’s not...

    One of the things that makes it tricky is that depreciation is a sort of standardized guess about how long it will be before something is replaced, and that guess can be wrong. And it’s not necessarily that the GPU wears out and the company has to replace it, but rather that they decide to upgrade to a newer model even when the old one still works.

    In the end they have to take the expense of the equipment in some year and depreciation changes the year. If they decide to replace the equipment before it’s fully depreciated then they take the remainder of the expense immediately. Or they could keep running it after it depreciates.

    3 votes
  9. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    (edited )
    Link Parent
    Annualized revenue means, for example, that you take one month of revenue and multiply by 12 to get a more impressive number. If you do that for a previous quarter and the current quarter and take...

    revenue and usage increased 80-fold in the first quarter on an annualized basis

    Annualized revenue means, for example, that you take one month of revenue and multiply by 12 to get a more impressive number. If you do that for a previous quarter and the current quarter and take a percentage then the x12 would cancel out and it’s just the quarterly growth rate.

    But then, why say “annualized” at all if it cancels out?

    I think you’re right, it must be some kind of projection and we don’t really know how they did it. It does make the 80x number more reasonable if it’s a projection. 80x yearly growth for one quarter is apparently amost 200% quarterly growth. Tripling in size is very impressive rather than being surreal.

    3^4 = 81.

    3 votes
  10. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    Link Parent
    Sharing numbers with the public is unnecessary to get private investment. They can share more details with their investors than they share with the public. We simply don’t know what their...

    Sharing numbers with the public is unnecessary to get private investment. They can share more details with their investors than they share with the public. We simply don’t know what their investors are seeing.

    It seems reasonable to infer that those numbers probably look pretty good.

    They have no path to profitability.

    Revenue growth, assuming it holds up and they’re not losing money on inference, is an excellent path to profitability.

    3 votes
  11. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    Link Parent
    They are private companies so they don’t have to publish financial statements. Their CEOs will reveal some numbers now and then, when the numbers are good. So, do take it with a grain of salt, but...

    They are private companies so they don’t have to publish financial statements. Their CEOs will reveal some numbers now and then, when the numbers are good. So, do take it with a grain of salt, but it’s not a fraud indicator.

    7 votes
  12. Comment on Railway solar project turns unused track space into energy in ~enviro

    skybrian
    Link
    You can probably lay solar panels on the ground just about anywhere but sure, why not? Maybe installation is easier?

    You can probably lay solar panels on the ground just about anywhere but sure, why not? Maybe installation is easier?

    3 votes
  13. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    (edited )
    Link Parent
    There is definitely something very strange going on, but this doesn’t look like just accounting shenanigans. Anthropic CEO says 80-fold growth in first quarter explains 'difficulties with compute'...

    There is definitely something very strange going on, but this doesn’t look like just accounting shenanigans.

    Anthropic CEO says 80-fold growth in first quarter explains 'difficulties with compute'

    80x revenue growth in a quarter is simply surreal. How can that be true? But it does seem to explain why Anthropic’s uptime is worse than Twitter in the fail whale days, and why they hurriedly rented an entire data center from SpaceX along with deals with Amazon and Google, and why other companies are making private investments at such sky-high valuations that it’s screwing up the accounting for the entire S&P 500.

    Other data points: GitHub has been down a lot because their of their exponential traffic growth. And from one customer’s perspective, Uber burned through their AI budget for the year in one quarter. They are slowing hiring to spend more on AI.

    So, it seems businesses really are spending huge amounts on AI to Anthropic’s benefit and maybe they will keep spending?

    Apparently their managements believe their AI bills are worth the money. They would rather spend money on AI than on labor.

    And the question is how much of that is a fad, or whether companies are just going to keep paying their AI bills from now on. Will they change their minds? Management at many software companies has been pushing AI hard, but suppose they stop to reassess what benefit they’re getting from it?

    Also, there are some signs that competition from other AI companies is going to change things. Self-hosting an LLM is almost sorta adequate. There are also companies selling inference at much lower prices. Also, Google is in competition with its investment in Anthropic and there’s no reason to believe their LLMs won’t get better.

    I don’t suppose anyone really expects Anthropic will see 80x revenue growth again because that would be too crazy and they can’t rent data centers fast enough to do it. They’re probably still expecting a lot of growth, though.

    9 votes
  14. Comment on Goldman Sachs flags Amazon and Alphabet for inflating S&P 500 earnings growth figures in ~finance

    skybrian
    Link
    From the article: A Fortune article puts it this way: Google and Amazon’s biggest profit driver last quarter was their Anthropic stakes—which they haven’t sold But I was unable to figure out how...

    From the article:

    According to Goldman’s investment research team, reported first-quarter earnings growth for the benchmark index came in near 25%. However, the bank noted that gains tied to investment-related activity at Amazon and Alphabet significantly boosted the aggregate results, creating what analysts described as a distortion in the overall earnings picture.

    After adjusting for those investment gains, Goldman estimates underlying earnings growth for the S&P 500 was closer to 16%, still reflecting solid corporate profitability but at a notably slower pace than headline data suggests.

    A Fortune article puts it this way: Google and Amazon’s biggest profit driver last quarter was their Anthropic stakes—which they haven’t sold

    But I was unable to figure out how much of Google's Q1 profits were from Anthropic. They're not mentioned in the earnings release. However, gains from equity investments, which will include other investments like SpaceX, were $36.9 billion, with a footnote:

    Includes all gains and losses, unrealized and realized, on equity securities. For Q1 2026, the net effect of the gain on equity securities of $36.9 billion increased the provision for income tax, net income, and diluted net income per share by $8.2 billion, $28.7 billion, and $2.35, respectively. Fluctuations in the value of our investments may be affected by market dynamics and other factors and could significantly contribute to the volatility of OI&E in future periods.

    Warren Buffett would frequently write a similar but more straightforward warning in his newsletters, that GAAP accounting rules requiring equity investments to be marked to market results in large swings in the value of Berkshire Hathaway's equity investments from one quarter to the next.

    So apparently, Alphabet is becoming more like an investment firm now?

    3 votes
  15. Comment on Why I find woke criticism of veganism and effective altruism so outrageous in ~society

    skybrian
    Link Parent
    I don't think it's much of heuristic and maybe we should be more cautious about attributing intentions to a system. In computing, it's often unwarranted anthropomorphism. The people working on a...

    I don't think it's much of heuristic and maybe we should be more cautious about attributing intentions to a system. In computing, it's often unwarranted anthropomorphism. The people working on a system (or as part of a system) may have intentions, and sometimes they conflict.

    4 votes
  16. Comment on Why I find woke criticism of veganism and effective altruism so outrageous in ~society

    skybrian
    (edited )
    Link Parent
    Why not? Intentions and results are logically distinct. People fail to do what they intend all the time. It seems like you're asserting that it's a useful phase, but you haven't shown that it's...

    The entire point of POSIWID is that you cannot claim a system is intended to do something which is constantly fails to do.

    Why not? Intentions and results are logically distinct. People fail to do what they intend all the time.

    It seems like you're asserting that it's a useful phase, but you haven't shown that it's useful other than as a misleading rhetorical move.

    It's true that you might be suspicious that when a system that fails to achieve its supposed purpose, that purpose is just an empty phrase. Many company slogans are empty. It might be worth investigating. But suspicions aren't proof and repeating a catch-phrase doesn't actually turn it into proof.

    4 votes