48 votes

DeepSeek R1 reproduced for $30: University of California Berkeley researchers replicate DeepSeek R1 for $30—casting doubt on H100 claims and controversy

9 comments

  1. [8]
    Wes
    Link
    To clarify, they reproduced chain-of-thought reasoning in a narrow context on a 3B model for under $30. This effectively proves that the research methods published by DeepSeek do work, though the...

    To clarify, they reproduced chain-of-thought reasoning in a narrow context on a 3B model for under $30. This effectively proves that the research methods published by DeepSeek do work, though the headline's claim of them reproducing R1 are definitely not true.

    It's still fascinating to see reasoning develop in AI models that resemble that of a human's internal monologue, entirely through self-guided reinforcement learning. Once again we're finding that a hands-off approach is best when it comes to effective training. The AI is better at self-learning than we are at teaching.

    59 votes
    1. [7]
      nic
      Link Parent
      To summarize my understanding... OpenAI applied massive compute to ingest all the textual knowledge in the world and create a predictive model that is amazing at regurgitating existing human...
      • Exemplary

      To summarize my understanding...

      1. OpenAI applied massive compute to ingest all the textual knowledge in the world and create a predictive model that is amazing at regurgitating existing human written knowledge, but was terrible at things not commonly written down. It lacked the fundamental ability to discern when an answer was correct, versus simply sounding correct. The more specialized an area was, the more recent the knowledge, the more the engine tended to simply sound confidently incorrect.

      2. OpenAI created a reasoning engine that could break a problem down into steps, tackle each step one at at a time, evaluate the response, and maybe try a different approach if the first approach didn't seem correct. They also created the ability to invoke tools, like the ability to search the internet, or execute code. Which meant instead of being bad at math, the reasoning engine would simply write a bit of code to do the math, execute the code to get the answer, and return the result. And instead of giving hallucinating about current events, the reasoning engine knew when to just go search the internet.

      3. The hedge fund High-Flyer created Deep Seek, because China was cracking down on hedge funds, so they had spare 800 chips, and some quants that were great at code. They had to optimize LLM tech heavily, because Biden restricted China from easily acquiring the more powerful and more expensive 100 chips, which were up till then, mandatory for training and often needed also for inference. Part of the optimization might have been to "steal" the knowledge from existing LLMs, much like existing LLMs "stole" this knowledge from copyright materials. But part of the optimizations was genuine IP, allowed Deep Seek to train and run LLMs on 800's at a fraction of the expense vs everyone else who just threw more money at the problem of scale. So Bidens attempt to retain a competitive advantage by restricting trade, could have completely backfired, but oddly enough the hedge fund decided to open sourced everything, basically telling everyone else how to replicate their success. It didn't cause much of a splash, because there were so many innovations in the LLM space.

      4. Deep Seek came out with a reasoning engine. They open sourced this as well. It was built on top of their own LLM. Up until this point, only OpenAI had cracked the reasoning engine nut. But OpenAI was no longer open sourcing anything. (In fact, OpenAI was kinda pissed that Deep Seek was open sourcing stuff.) This caught everyones attention. Especially when NVDIA crashed. Did the hedge fund do all this hard work, open source everything, just to cause NVDIA to crash? Was this some incredibly elaborate short of the US markets? I mean, it sounded just plain weird. A small hedge fund, with 200 employees, completely crushing the US with incredible innovations? Or they were faking FUD to short NVIDA? But they open sourced their code and weights, they published papers explaining how they did what they did, so anyone could replicate their success. In addition to optimizing low level code to improve Generative AI training and inference efficiencies 10x Deep Seek figured out how to train a reasoning engine in an incredibly efficient manner. They basically created huge numbers of quantifiable, verifiable problems. Then they trained the reasoning agent to keep trying to solve the problem until the solution checked out. E.g. Create some code to do X. If it didn't compile, run, an execute correctly, then try again. Or solve math problem Y. If the answer is incorrect, try again. What the the Berkeley researches verified, is that if you limit Y to one very specific math game, you can train the reasoning engine for less than $50.

      TLDR: A small Chinese hedge fund really did outfox the US, and they really did open source all their IP. Which makes Open AI look bad on both counts. And while I hope High-Flyer shorted the shit out of NVDIA, that is the side note.

      China is now a heavyweight in generative AI... in spite of Biden's bans... and because they embraced open source.

      77 votes
      1. Akir
        Link Parent
        It feels like there is a lesson to be learned about the free exchange of ideas somewhere in this story....

        It feels like there is a lesson to be learned about the free exchange of ideas somewhere in this story....

        38 votes
      2. [2]
        skybrian
        Link Parent
        Perhaps the export restrictions were unintentionally a good thing because they spurred innovation?

        Perhaps the export restrictions were unintentionally a good thing because they spurred innovation?

        9 votes
        1. nic
          Link Parent
          Perhaps also better for the planet, less chips required, less electricity required. What's funny to me is that I don't think Biden intended to spur innovation in China. And I don't think China...

          Perhaps also better for the planet, less chips required, less electricity required.

          What's funny to me is that I don't think Biden intended to spur innovation in China. And I don't think China intended hedge funds to become such huge proponents of sharing incredibly valuable IP. Yet here we are. The law of unintended consequences.

          24 votes
      3. [3]
        RobotOverlord525
        Link Parent
        I'm not sure if this conclusion is supported by these events. Particularly if DeepSeek developed their model based on closed-source model weights obtained by some shady means. Now, if they based...

        China is now a heavyweight in generative AI... in spite of Biden's bans... and because they embraced open source.

        I'm not sure if this conclusion is supported by these events. Particularly if DeepSeek developed their model based on closed-source model weights obtained by some shady means. Now, if they based their work off of another open source model, like Meta's Llama, then I could get behind the idea that they advanced the state-of-the-art "because they embraced open source."

        Personally, I'm very sympathetic to the AI safety concerns about proliferating powerful LLMs. Because China is not a heavy weight in generative AI after this — anyone is. The Kremlin has as much access to DeepSeek's source code as anyone in China does.

        The New York Times' Hard Fork podcast discussed this topic last week and they had this to say on the safety implications.

        Kevin Roose

        So the third group of people that I would say are freaking out about DeepSeek are AI safety experts, people who worry about the growing capabilities of AI systems and the potential that they could very soon achieve something like general intelligence or possibly superintelligence, and that that could end badly for all of humanity.

        And the reason that they’re spooked about DeepSeek is this technology is open source. DeepSeek released R1 to the public. It’s an open weights model, meaning that anyone can download it and run their own versions of it or tweak it to suit their own purposes.

        And that goes to one of the main fears that AI safety experts have been sounding the alarms on for years, which is that just that this technology, once it is invented, is very hard to control. It is not as easy as stopping something like nuclear weapons from proliferating. And if future versions of this are quite dangerous, it suggests that it’s going to be very hard to keep that contained to one country or one set of companies.

        Casey Newton

        Yeah, I mean, say what you will about the American AI labs, but they do have safety researchers. They do at least have an ethos around how they’re going to try to make these models safe. It’s not clear to me that DeepSeek has a safety researcher. Certainly, they have not said anything about their approach to safety, right? As far as we can tell, their approach is, yeah, let’s just build AGI, give it to as many people as possible, maybe for free, and see what happens. And that is not a very safety-forward way of thinking.

        Maybe I'm just paranoid, but that does resonate with me. I don't like the idea of authoritarian regimes having access to powerful LLM AIs.

        4 votes
        1. RoyalHenOil
          Link Parent
          I completely agree with you. Unfortunately, China is an authoritarian regime, and the US seems to be rapidly turning into one. Access to this technology might be the best shot that smaller...
          • Exemplary

          Maybe I'm just paranoid, but that does resonate with me. I don't like the idea of authoritarian regimes having access to powerful LLM AIs.

          I completely agree with you. Unfortunately, China is an authoritarian regime, and the US seems to be rapidly turning into one. Access to this technology might be the best shot that smaller democratic nations have at protecting themselves. I really don't know.

          8 votes
        2. rahmad
          Link Parent
          Counterpoint: A regime you don't like will likely find a way to acquire the things you don't want them to have -- or find effective workarounds. The thing being open source or closed source...

          Counterpoint:

          A regime you don't like will likely find a way to acquire the things you don't want them to have -- or find effective workarounds. The thing being open source or closed source doesn't particularly change that.

          5 votes
  2. zptc
    Link
    I can't judge the quality of the following but it seems relevant: DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts H100 Pricing Soaring, Subsidized...

    I can't judge the quality of the following but it seems relevant:

    DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts H100 Pricing Soaring, Subsidized Inference Pricing, Export Controls, MLA

    We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth.

    We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. These GPUs are shared between High-Flyer and DeepSeek and geographically distributed to an extent. They are used for trading, inference, training, and research. For more specific detailed analysis, please refer to our Accelerator Model.

    Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location.

    10 votes