47 votes

What is China’s DeepSeek and why is it freaking out the AI world?

72 comments

  1. [52]
    dustylungs
    Link
    Not only does DeepSeek put into question conventional ideas about the economics of AI technology development, it also appears to illustrate that US restrictions on technology exports to China may...

    DeepSeek, a Chinese AI startup that’s just over a year old, has stirred awe and consternation in Silicon Valley after demonstrating breakthrough artificial-intelligence models that offer comparable performance to the world’s best chatbots at seemingly a fraction of the cost.

    Though not fully detailed by the company, the cost of training and developing DeepSeek’s models appears to be only a fraction of what’s required for OpenAI or Meta Platforms Inc.’s best products. The much better efficiency of the model puts into question the need for vast expenditures of capital to acquire the latest and most powerful AI accelerators from the likes of Nvidia Corp.

    DeepSeek says R1 is near or better than rival models in several leading benchmarks such as AIME 2024 for mathematical tasks, MMLU for general knowledge and AlpacaEval 2.0 for question-and-answer performance. It also ranks among the top performers on a UC Berkeley-affiliated leaderboard called Chatbot Arena.

    Not only does DeepSeek put into question conventional ideas about the economics of AI technology development, it also appears to illustrate that US restrictions on technology exports to China may be far more ineffective than thought — a topic also discussed in the article.

    42 votes
    1. [51]
      vord
      Link Parent
      Time to short NVIDIA. American exceptionalism WRT technology is no different than other exceptionalism narratives: It's mostly a tale we tell ourselves to justify abhorrent behavior.

      Time to short NVIDIA.

      American exceptionalism WRT technology is no different than other exceptionalism narratives: It's mostly a tale we tell ourselves to justify abhorrent behavior.

      28 votes
      1. [15]
        arch
        Link Parent
        I'm going to be blunt in my reply: as a gamer I really hope so. Their pricing has been absurd because of this, and it was absurd before AI due to Bitcoin and mining. We will need these chips for...

        Time to short NVIDIA.

        I'm going to be blunt in my reply: as a gamer I really hope so. Their pricing has been absurd because of this, and it was absurd before AI due to Bitcoin and mining. We will need these chips for the foreseeable future, there's no viable alternative in sight, but the last few years have seemed like an "AI Bubble" reminiscent of the dot-com boom of the 1990's.

        36 votes
        1. [9]
          ButteredToast
          Link Parent
          Data centers and workstation users being Nvidia’s biggest customers has also no doubt shaped their upper-end consumer card offerings. The number of gamers and other general consumers that have a...

          Data centers and workstation users being Nvidia’s biggest customers has also no doubt shaped their upper-end consumer card offerings. The number of gamers and other general consumers that have a PSU capable of keeping a 5090 or 5080 fed is tiny.

          The 800W PSU I currently use to comfortably power a 5950X and 3080Ti, which is already much higher end than what the majority have, wouldn’t cut it and that’s just absurd.

          14 votes
          1. [7]
            babypuncher
            Link Parent
            I really wanted to argue with this, but I don't actually have any statistics on the power supply capacity of the average gaming PC. All I can really say is that 800 watts, while more than most...

            I really wanted to argue with this, but I don't actually have any statistics on the power supply capacity of the average gaming PC. All I can really say is that 800 watts, while more than most people have historically needed, has been pretty typical for recommended gaming PC builds for close to a decade now. Largely because it's much better to err on the side of over-provisioning your power supply than under-provision.

            High end PC builds have always come with hefty power supply requirements. The biggest difference today is that high end systems have gone from being built around shitty multi-GPU solutions like SLI to being built around monster single GPUs. Plenty of quad-SLI systems from back in the day drew a lot more power than a 5080.

            4 votes
            1. [6]
              ButteredToast
              Link Parent
              800W might’ve been recommended back then, but certainly wasn’t necessary if one wasn’t cheaping out on their PSU. My old currently-decomissioned i7-6700k build originally had an EVGA 980Ti...

              800W might’ve been recommended back then, but certainly wasn’t necessary if one wasn’t cheaping out on their PSU. My old currently-decomissioned i7-6700k build originally had an EVGA 980Ti Classified in it and though I never overclocked, its high quality 650W PSU had no trouble handling that pairing at all. These days a 650W PSU isn’t going to get you much further than an AMD APU or maybe an entry-level dedicated GPU paired with midrange CPU. There’s definitely been some requirements creep.

              2 votes
              1. [5]
                phoenixrises
                Link Parent
                Have you ever tested your power usage? Not saying some of the modern requirements arent getting higher but I'm also running a 3080 with a Ryzen 7 and an AIO on 650w for the last 4 years with no...

                Have you ever tested your power usage? Not saying some of the modern requirements arent getting higher but I'm also running a 3080 with a Ryzen 7 and an AIO on 650w for the last 4 years with no issues.

                4 votes
                1. [2]
                  derekiscool
                  Link Parent
                  +1 to this. I ran a 3080 and 7800x3d on 700w with no issues for a while. I did run into issues when upgrading to a 4080S and 9800x3d. Upgrading the PSU from 700w to 1000w solved that problem

                  +1 to this. I ran a 3080 and 7800x3d on 700w with no issues for a while.

                  I did run into issues when upgrading to a 4080S and 9800x3d. Upgrading the PSU from 700w to 1000w solved that problem

                  3 votes
                  1. phoenixrises
                    Link Parent
                    Yeah to be clear I'm not saying that there's not ballooning of specs!

                    Yeah to be clear I'm not saying that there's not ballooning of specs!

                2. [2]
                  ButteredToast
                  Link Parent
                  I haven’t tested. If I recall correctly, when I built that machine I wanted to guard against spikes in power draw from the 3080Ti exceeding what the PSU was capable of delivering. Also just wanted...

                  I haven’t tested. If I recall correctly, when I built that machine I wanted to guard against spikes in power draw from the 3080Ti exceeding what the PSU was capable of delivering. Also just wanted some headroom for future upgrades, but now it’s looking like it might not have been enough to make a difference.

                  1 vote
                  1. phoenixrises
                    Link Parent
                    If it's any consolation I play a lot of high graphics games and have hit 0 issues in the last 5 years knock on wood!

                    If it's any consolation I play a lot of high graphics games and have hit 0 issues in the last 5 years knock on wood!

                    2 votes
          2. raze2012
            Link Parent
            And yet we'll hear more and more AAA games not properly optimizing and relying on this DLSS4 if you want anything higher than 20-30 FPS, even on medium settings. We already felt some small effects...

            And yet we'll hear more and more AAA games not properly optimizing and relying on this DLSS4 if you want anything higher than 20-30 FPS, even on medium settings. We already felt some small effects this decade over the much less effective DLSS3 so this will only get worse for PC gaming.

        2. Asinine
          Link Parent
          I'm so friggin happy I swapped over to AMD just before covid. That being said, I'm a huge gamer but I don't do anything really high-end. I used to be a huge Nvidia fangirl, but have been shaking...

          I'm so friggin happy I swapped over to AMD just before covid. That being said, I'm a huge gamer but I don't do anything really high-end. I used to be a huge Nvidia fangirl, but have been shaking my head over all this AI stuff.

          I hate to admit, but I am glad China has shown us that the US corporate method may not be necessary. Feels kinda like it could be a big pharm blowout...

          5 votes
        3. [3]
          elight
          Link Parent
          Amen. It's one company and a few associated ones. If the cost to create and operate LLMs can be made orders of magnitude cheaper, I'm generally of the mind that everyone wins except those few...

          Amen. It's one company and a few associated ones. If the cost to create and operate LLMs can be made orders of magnitude cheaper, I'm generally of the mind that everyone wins except those few companies: more and better AI more cheaply.

          Side benefit: just maybe us PC gamers get to afford top of the line graphics cards once again. It's been, oh, I don't know, damn near a decade now of GPUs being insanely expensive.

          And I'd love to have me a 5090... or whatever replaces NVIDIA if they tank over this.

          Anyone remember 3DFX? Those cards were a lot more affordable!

          3 votes
          1. [2]
            Tannhauser
            Link Parent
            From my understanding, the graphics cards that LLMs and other AI/ML companies/programs use are completely different beasts than the 4XXX 5XXX etc cards. Instead they're A100/H100 etc. and cost in...

            From my understanding, the graphics cards that LLMs and other AI/ML companies/programs use are completely different beasts than the 4XXX 5XXX etc cards. Instead they're A100/H100 etc. and cost in the five figure range.

            1. Greg
              Link Parent
              Depends on the workload, but the consumer cards are excellent for AI/ML dev work if you can live with the VRAM limitations. If you’re a large company buying thousands for a datacenter then yeah,...

              Depends on the workload, but the consumer cards are excellent for AI/ML dev work if you can live with the VRAM limitations. If you’re a large company buying thousands for a datacenter then yeah, you’re getting the five figure cards, but in terms of raw mathematical performance per dollar the xx90 cards are actually hard to beat. Anecdotally I know several smaller companies using them in their standard dev workstations - honestly I see the 5090 release as a good value CUDA card rather than an expensive gaming card.

              That said, it doesn’t apply so much to the xx60 and xx70s that people are more realistically going to buy for gaming, and for those I think it’s more just a matter of where NVIDIA are allocating their resources. Everything that goes into gaming cards has to be justified as money/staff/manufacturing capacity that isn’t allocated to those five figure datacenter chips that are already so in demand they have a waitlist, and the consumer card price goes up until it’s a worthwhile trade off.

              4 votes
        4. Eji1700
          Link Parent
          This has basically nothing to do with the cost gamers currently, and will continue, to pay for GPU's. There's a lot going on right now, but GPU's for gamers is an "oh yeah that" line item for...

          as a gamer I really hope so.

          This has basically nothing to do with the cost gamers currently, and will continue, to pay for GPU's.

          There's a lot going on right now, but GPU's for gamers is an "oh yeah that" line item for NVIDIA at this point, and seeing as how they've got almost no competition (ESPECIALLY when it comes to drivers, and have strong network effects when it comes to development) it's not about to change anytime soon.

          The age of getting last years top of the line for half off is over. The good news is that you probably won't need top of the line anymore either. There's an interesting plateau that's being hit in several areas, and chasing the hardware isn't worth it.

          1 vote
      2. [2]
        CptBluebear
        Link Parent
        I hope you did because it's down 17% in two hours.

        Time to short NVIDIA.

        I hope you did because it's down 17% in two hours.

        9 votes
        1. vord
          Link Parent
          Lol wish I had that kind of free cash to gamble.

          Lol wish I had that kind of free cash to gamble.

          3 votes
      3. [17]
        PendingKetchup
        Link Parent
        Why should the news that NVIDIA's products are able to be used more effectively make them less valuable? Do we think people are likely to prefer "good-enough" models and that we are already there...

        Why should the news that NVIDIA's products are able to be used more effectively make them less valuable? Do we think people are likely to prefer "good-enough" models and that we are already there with current hardware?

        If one 5090 or whatever can now replace an OpenAI subscription, I feel like people then would want it more.

        8 votes
        1. [15]
          ButteredToast
          (edited )
          Link Parent
          Because it means that demand for more cards and in the future, new generations of cards will be diminished. Much of Nvidia’s valuation was pricing in an ever-increasing demand for cards which...

          Because it means that demand for more cards and in the future, new generations of cards will be diminished. Much of Nvidia’s valuation was pricing in an ever-increasing demand for cards which isn’t possible if major improvements in model efficiency becomes a persistent trend — it might prop up sales of upper end 5000 series cards in the short term, but in the long term if efficiency continues to improve cards people already own will become more and more effective. Depending on how much optimization remains to be tapped, it could get to the point where used upper tier 4000 or even 3000 series cards are adequate for most LLM needs, which is Nvidia’s worst nightmare.

          Increases in efficiency also degrades Nvidia’s moat, which has mostly existed due to their cards’ sheer muscle. With better efficiency, competing solutions from AMD and Apple among others become viable alternatives, which means that Nvidia suddenly has to compete and loses pricing power.

          4 votes
          1. [12]
            tesseractcat
            Link Parent
            This is only true if people decide that LLMs at the current level are good enough. Otherwise, more efficient training/inference will just result in bigger models with more capabilities (assuming...

            This is only true if people decide that LLMs at the current level are good enough. Otherwise, more efficient training/inference will just result in bigger models with more capabilities (assuming scaling laws hold).

            8 votes
            1. [9]
              ButteredToast
              Link Parent
              Even that situation doesn’t bode well for Nvidia in the long term because there is likely a point of diminishing returns for model size which will be moving lower and lower as efficiency improves....

              Even that situation doesn’t bode well for Nvidia in the long term because there is likely a point of diminishing returns for model size which will be moving lower and lower as efficiency improves. Even if models haven’t hit the point of “good enough” yet, they will in the not-so-far future and need for increased power (and thus, the market for cards) will stagnate.

              4 votes
              1. [8]
                tesseractcat
                Link Parent
                Why do you think models will hit the point of "good enough" in the not-so-far future? OpenAI and co explicitly state their goal is AGI/ASI, which seems like an ambitious goal. Also, o1/o3/deepseek...

                Why do you think models will hit the point of "good enough" in the not-so-far future? OpenAI and co explicitly state their goal is AGI/ASI, which seems like an ambitious goal. Also, o1/o3/deepseek r1 are all starting to use RL techniques, which are very compute intensive.

                4 votes
                1. [6]
                  ButteredToast
                  Link Parent
                  OpenAI’s ability to achieve their stated goals is increasingly questionable. It’s possible that they’re incubating a massive leap forward, but everything that’s public and rumored has pointed...

                  OpenAI’s ability to achieve their stated goals is increasingly questionable. It’s possible that they’re incubating a massive leap forward, but everything that’s public and rumored has pointed towards a serious plateau and mostly failing mad scramble to keep up the appearance of continued progress.

                  That includes the new reasoning stuff. In my usage it’s improved output little to none while dragging down performance. That could change of course, but I’ve not seen that yet.

                  I’m also generally dubious that the current iteration of “AI” will lead to AGI. That’s not to say that the key to AGI won’t be stumbled upon in the coming years but I’m not expecting it to have much in common with today’s LLM-based systems, and if that’s true I don’t think OpenAI is going to be the one to discover it. They seem more and more like a one-trick pony.

                  10 votes
                  1. [5]
                    tesseractcat
                    Link Parent
                    Maybe. From my perspective at least, it seems like OpenAI spends the money to make the expensive breakthroughs, and everyone else copies them 1-2 years later. After all, the hype around deepseek...

                    Maybe. From my perspective at least, it seems like OpenAI spends the money to make the expensive breakthroughs, and everyone else copies them 1-2 years later. After all, the hype around deepseek is entirely because they replicated the new reasoning stuff that OpenAI was working on.

                    I would be surprised if AGI (not my favorite terminology) wasn't some derivative of LLMs, although probably a more multi-modal model (audio/video/robotics/language/etc), and RL used heavily in training.

                    2 votes
                    1. [3]
                      stu2b50
                      Link Parent
                      No, the hype is that they did it with $5m worth of training. Which is insanely cheap. That's like how much google's farts cost.

                      After all, the hype around deepseek is entirely because they replicated the new reasoning stuff that OpenAI was working on.

                      No, the hype is that they did it with $5m worth of training. Which is insanely cheap. That's like how much google's farts cost.

                      4 votes
                      1. [2]
                        tesseractcat
                        Link Parent
                        Fair, although to defend my point a bit, the hype is still because they replicated OpenAIs o1-style reasoning. DeepSeek v3 had been around for a month or so with little hype. Although maybe that's...

                        Fair, although to defend my point a bit, the hype is still because they replicated OpenAIs o1-style reasoning. DeepSeek v3 had been around for a month or so with little hype. Although maybe that's just because it took a while to accrue publicity, so who knows /shrug.

                        It would be really interesting to compare with how much money it's costing OpenAI to train their models, unfortunately to the best of my knowledge they're not open with that information.

                        1. stu2b50
                          Link Parent
                          Meta is public about it, so we can get an idea of US state of art. On GPU hours, Llama 3 costs $250m to train. I would assume OpenAI's flagship models are even higher, since their models have far...

                          Meta is public about it, so we can get an idea of US state of art. On GPU hours, Llama 3 costs $250m to train. I would assume OpenAI's flagship models are even higher, since their models have far more parameters than Llama does.

                    2. ButteredToast
                      Link Parent
                      It’s true that OpenAI has generally been the leader that everybody else then follows, but the time gap to catch up has been closing, and in some cases the models of other companies are better than...

                      It’s true that OpenAI has generally been the leader that everybody else then follows, but the time gap to catch up has been closing, and in some cases the models of other companies are better than OpenAI’s. Claude for example in my experience has been better for the “talking documentation” and “semi-intelligent rubber duck” programmer use case than ChatGPT has.

                      3 votes
                2. PuddleOfKittens
                  Link Parent
                  If they achieve that then congratulations to them, they have officially won the economy. The whole thing. They'll put almost-literally everyone out of a job (assuming the AGI can fit on a chip...

                  OpenAI and co explicitly state their goal is AGI/ASI, which seems like an ambitious goal.

                  If they achieve that then congratulations to them, they have officially won the economy. The whole thing. They'll put almost-literally everyone out of a job (assuming the AGI can fit on a chip that costs e.g. less than $50k). At that point their net worth will be at least several hundred trillion dollars, because that's the cost of the entire world economy for a couple of years.

                  However, they aren't going to achieve that. The stock market clearly agrees with me, because even if they only have a 1% chance of achieving that, 1% of $100T is 1 trillion dollars net worth, which OpenAI has not been valued at.

            2. tauon
              (edited )
              Link Parent
              I’m gonna go out on a limb here and say that’s already been priced in for about the last year. Nvidia’s stock price-to-sales revenue ratio has been $TSLA levels of comically absurd for a while,...

              I’m gonna go out on a limb here and say that’s already been priced in for about the last year.

              Nvidia’s stock price-to-sales revenue ratio has been $TSLA levels of comically absurd for a while, maybe this is the expected correction now?

              3 votes
            3. babypuncher
              Link Parent
              In my definitely non-expert opinion: a very real possibility is that LLMs have started to peak in their capabilities, and simply throwing more complexity and hardware at the models isn't...

              In my definitely non-expert opinion: a very real possibility is that LLMs have started to peak in their capabilities, and simply throwing more complexity and hardware at the models isn't necessarily going to keep leading to better results going forward.

              If that's the case, then the real innovation going forward will be in improving model efficiency, and as a result reducing the demand for all this hardware.

              2 votes
          2. vord
            Link Parent
            This is in essence what I was getting at. News which disrupts the current hype bubble (NVIDIA in dominating AI position with huge margins with infinite demand), even if the effect of this...

            This is in essence what I was getting at. News which disrupts the current hype bubble (NVIDIA in dominating AI position with huge margins with infinite demand), even if the effect of this disruption is overstated, will tank stock prices as previously-ignored criticism is brought to the spotlight.

            2 votes
          3. PendingKetchup
            Link Parent
            I guess this makes sense if you thing there is a set level of capability beyond which you don't really want more improvements (or you don't have the data to drive them), and it isn't just "how...

            I guess this makes sense if you thing there is a set level of capability beyond which you don't really want more improvements (or you don't have the data to drive them), and it isn't just "how much vram will Nvidia sell me before they start raising the price to performance curve to tell me to stop". Maybe I need a model equivalent to Llama 3 70b for some reason but the same sort of thing at what was previously 405b's capability level won't help?

            I guess I don't really understand what people think they're going to do with these in general, so I think the same vague hype-based justifications can be used to sell models of arbitrary power as a thing worth buying for some reason.

            1 vote
      4. [15]
        Plik
        Link Parent
        You're too late, it's down ~12%, as is $TSM. I think it's an over reaction, DeepSeek is hilariously bad (yes, I tried it).

        You're too late, it's down ~12%, as is $TSM. I think it's an over reaction, DeepSeek is hilariously bad (yes, I tried it).

        6 votes
        1. [2]
          CptBluebear
          Link Parent
          I don't think the efficacy of the LLM in question is what's causing the dip but rather the computational efficiency, putting Nvidias strategy in the crosshairs: Do we need more powerful server...

          I don't think the efficacy of the LLM in question is what's causing the dip but rather the computational efficiency, putting Nvidias strategy in the crosshairs: Do we need more powerful server farms for this, or do we just need to optimize the models instead?

          Notably, it isn't just Nvidia, but on the periphery you see energy companies such as GE Vernova dip too. Nuclear and/or SMB companies were betting big on AI needing ever more power and this model may just prove the opposite.

          Whether or not that's true is anyone's guess, but the market always responds rather panicky.

          7 votes
          1. Plik
            Link Parent
            Yeah, it just seems a bit much. Wouldn't a more efficient AI also mean you could use the newer chips with that same AI, but with more/better results? Either way, I got a $TSM LEAP at a $2000...

            Yeah, it just seems a bit much. Wouldn't a more efficient AI also mean you could use the newer chips with that same AI, but with more/better results?

            Either way, I got a $TSM LEAP at a $2000 discount, so I'll know pretty soon if it was justified panic or not xD.

        2. arch
          Link Parent
          TSM is definitely an over reaction, it'll be back up very soon. Apple is their largest customer; they'd survive with a smaller drop than that even if NVIDIA went under.

          TSM is definitely an over reaction, it'll be back up very soon. Apple is their largest customer; they'd survive with a smaller drop than that even if NVIDIA went under.

          6 votes
        3. [9]
          raze2012
          Link Parent
          This is a crass metaphor so don't take this too much to heard, but: It's like how the average American Engineers is more trained than the average Chinese/Taiwanese Engineer. But if the latter is 8...

          This is a crass metaphor so don't take this too much to heard, but:

          It's like how the average American Engineers is more trained than the average Chinese/Taiwanese Engineer. But if the latter is 8 times cheaper, you know exactly what an American company will want to ideally hire.

          Now this magnitude is more in the order of thousands of times cheaper. Maybe millions. We've seen how easily companies will trade productivity for cost.

          Asked it about sensitive China topics and it startrd self sensoring.

          I haven't seen the source yet, but I wonder if it's open source to the point where we can fork that model and uncensor it for non-Chinese countries?

          2 votes
          1. [8]
            Plik
            Link Parent
            To your last point, another thing I was wondering was, does it matter if it's open source? China loves to sneak extra data mining stuff into their hardware and software (yes, corporate America...

            To your last point, another thing I was wondering was, does it matter if it's open source? China loves to sneak extra data mining stuff into their hardware and software (yes, corporate America does the same thing), are companies going to want to trust it until it's been fully vetted? If not, how long would that take?

            Overall I feel like the news and market reaction was a bit extreme, DeepSeek was available before yesterday, so why the sudden fear? And won't it take at least a few months before anyone really knowledgable can tell if it's a western AI killer?

            I don't buy the NVidia panic either. You give me a super laser that runs off a 3V battery but can handle more, guess what I'm gonna do?...chain together 1000 9V batteries and shoot that thing into space just for fun. The battery manufacturer is still gonna make money off of me, both on 3V and 9V batteries.

            1 vote
            1. [5]
              Greg
              (edited )
              Link Parent
              Strongly agreed that the panic seems overblown. Anyone sufficiently close to the research side of the field has known for at least a year that DeepSeek are a serious competitor, and anyone who...
              • Exemplary

              Strongly agreed that the panic seems overblown. Anyone sufficiently close to the research side of the field has known for at least a year that DeepSeek are a serious competitor, and anyone who wasn’t expecting some level of significant gains in efficiency as the known-but-unsolved problems get solved isn’t someone I’d trust to make bets on the future, given the history of technology in general and ML research in particular.

              The big question is whether the price crash was a correction from a speculative bubble, meaning the current price sticks, or a group panic, meaning it’ll bounce back up soon enough. Given that the numbers are essentially made up in either case, I wouldn’t want to guess which it is.

              Open sourcing it matters a lot, though. It serves as a reference implementation for the techniques outlined in the paper, and as working proof of the efficacy of their mathematical approach. The model weights (the things that contain the actual “knowledge” from the few million spent on training) are just numbers, and can serve as a leg up for anyone who wants to train beyond that either with DeepSeek’s code or their own. The code serves as a template for anyone who wants to implement a similar transformer architecture elsewhere, potentially even in a completely different space (medical imaging, audio analysis, whatever).

              In short: we already know what they’ve changed, because anyone really knowledgeable can look at their paper and say “hey, that’s a really nice solution to this bottleneck we’ve all been struggling with, good job guys”. Reimplementing it from the paper wouldn’t be too onerous, but we don’t even have to do that because we’ve got a working reference implementation too. And experimenting with further advancements on top can start where they left off, because the weights are open too, rather than needing a few million to train from scratch.

              It’s a big deal, and I understand why it’s caught popular attention like this, but for anyone actually in the field I don’t think it should come as some foundation-shaking shock that someone figured out a particularly tricky problem.


              The article that @krellor posted below is a great technical overview, and it takes pretty much the tone I’d expect of someone who really knows the state of research here:

              I see many of the improvements made by DeepSeek as “obvious in retrospect”: they are the kind of innovations that, had someone asked me in advance about them, I would have said were good ideas. However, as I’ve said earlier, this doesn’t mean it’s easy to come up with the ideas in the first place.

              I’ve heard many people express the sentiment that the DeepSeek team has “good taste” in research. Based just on these architectural improvements I think that assessment is right. None of these improvements seem like they were found as a result of some brute-force search through possible ideas. Instead, they look like they were carefully devised by researchers who understood how a Transformer works and how its various architectural deficiencies can be addressed.

              None of us are saying “oh yeah, I could’ve thought of that”. It’s a legitimate breakthrough that the DeepSeek team should be proud of. But I’m suspicious of anyone presenting as an expert saying they didn’t expect anyone to work out meaningful improvements in these areas.

              [Edit] Expanded quote and fixed link

              9 votes
              1. krellor
                Link Parent
                My favorite phrase from an old math mentor of mine from grad school was "the definition of clever is 'seen it before.'" The improvements made here really are clever, but the maths aren't new, but...

                My favorite phrase from an old math mentor of mine from grad school was "the definition of clever is 'seen it before.'"

                The improvements made here really are clever, but the maths aren't new, but newly applied to this problem. It was an inspired effort to work it out and test it, but not quite on the same scale as the introduction of attention in 2017. Much respect for the team though!

                2 votes
              2. [3]
                Plik
                Link Parent
                Random question. When you say the weights are just numbers....are they n-dimensiol vectory things? I don't understand LLMs at all well, but sort of had this idea it's a bunch of connected points...

                Random question. When you say the weights are just numbers....are they n-dimensiol vectory things? I don't understand LLMs at all well, but sort of had this idea it's a bunch of connected points in n-space, and so you would use vectors to move between those?

                I am used to [2-3]-space with maybe some time thrown in, so just curious if I am imagining it the right way.

                1 vote
                1. [2]
                  Greg
                  Link Parent
                  Yeah, it sounds like you've got a solid intuition for it! If you really boil it down an ML model is more or less just a colossal mathematical function that turns one tensor (aka n-dimensional...

                  Yeah, it sounds like you've got a solid intuition for it! If you really boil it down an ML model is more or less just a colossal mathematical function that turns one tensor (aka n-dimensional vectory thing, which is phrasing I like very much) into another, and the weights are the numeric values in that function.

                  For an LLM your input starts as a list of words, gets converted to a list of numeric IDs using a tokenizer (basically just a slightly fancier dictionary lookup), and then that gets fed into the actual model. Everything from there is a mathematical operation involving the input and some subset of the weights: first mapping the input into a vector embedding space - that's the bit you might be visualising if you've seen the big point cloud diagrams like this - and then a series of operations to transform that into meaningful output.

                  You're generally working with somewhere between 2D and 5D tensors depending on exactly what a given layer of the model is doing, and on the order of half a billion elements in that tensor to pass through each layer after the embedding step, so hopefully that gives you some insight on where all these tens or hundreds of billions of parameters in the weights are coming from: they're the fixed values you need in order to perform a handful of operations on an object of that size.

                  2 votes
                  1. Plik
                    Link Parent
                    Awesome, thank you, this clarifies a lot.

                    Awesome, thank you, this clarifies a lot.

                    1 vote
            2. [2]
              raze2012
              Link Parent
              immensely, yes. Even if you truly don't touch the software, it means others can learn the techniques and algorithms and make their own software. Even without forking it directly. But I'm of course...

              does it matter if it's open source?

              immensely, yes. Even if you truly don't touch the software, it means others can learn the techniques and algorithms and make their own software. Even without forking it directly. But I'm of course assuming that they aren't hiding any particularly important techniques behind a server of something. I still haven't looked at the source myself.

              China loves to sneak extra data mining stuff into their hardware and software

              I'm not going to say it's impossible. Especially if the codebase is in the hundred thousands to millions of LoC. But generally speaking, it is extremely hard to hide such spyware within an open source repository. It being on Github will do some basic checks for such stuff to begin with. With enough eyes on it, someone will catch something eventually.

              Overall I feel like the news and market reaction was a bit extreme, DeepSeek was available before yesterday, so why the sudden fear?

              1. we're undoubtedly in a huge bubble right now. So all actions and reactions will be overblown. You can say the great stock crash of 1929 was all over overblown reactions. That's the nature of skittish investors who know nothing about the stuff they invest in seeing things that might make them pull or invest.

              2. Based on the article

              The DeepSeek mobile app was downloaded 1.6 million times by Jan. 25 and ranked No. 1 in iPhone app stores in Australia, Canada, China, Singapore, the US and the UK, according to data from market tracker App Figures.

              So I assume it needed a bit of time but quickly trended on the app stores to give it its current fame. I also do agree with several comments that the US is particularly blind to a lot of Chinese advances, even though they are "hiding" in plain sight. US news simply downplays or obscurs so much stuff until its way too late.

              he battery manufacturer is still gonna make money off of me, both on 3V and 9V batteries.

              Yes, but once again the bubble. NVidia wants allllllll the profits; They have a very hot stock right now, and anything that can disrupt that will have it plummet back down to a normal (but high) level. That will make a lot of already rich people very mad, so it's higher priority than any frivelous needs like cheaper groceries, adjusting modern worker wages, or rent control.

              5 votes
              1. Plik
                Link Parent
                Thank you. Good answers, you clarified a lot for me. I have done some programming before and used to main Linix, but have not really followed GitHun and open source development much recently.

                Thank you. Good answers, you clarified a lot for me. I have done some programming before and used to main Linix, but have not really followed GitHun and open source development much recently.

                1 vote
        4. [2]
          elight
          Link Parent
          Huh. I tried it and had the opposite reaction. R1 is slow but it gave me a far more thoughtful response than o1.

          Huh. I tried it and had the opposite reaction. R1 is slow but it gave me a far more thoughtful response than o1.

          1 vote
          1. Plik
            Link Parent
            Took me multiple attempts to login, then kept erroring out when I asked it questions. Finally got it to answer one finance question, which was a good response. Asked it about sensitive China...

            Took me multiple attempts to login, then kept erroring out when I asked it questions.

            Finally got it to answer one finance question, which was a good response.

            Asked it about sensitive China topics and it startrd self sensoring. At one point it printed out a ~4 screen thought process, and then promptly deleted it and asked to change the subject.

            1 vote
      5. unkz
        Link Parent
        Hardly. The next step is just going to be taking their methods and scaling it up to a supercluster.

        Hardly. The next step is just going to be taking their methods and scaling it up to a supercluster.

  2. krellor
    Link
    For anyone interested, here is an article that gives some of the technical changes DeepSeek implemented: How has DeepSeek improved the Transformer architecture?
    • Exemplary

    For anyone interested, here is an article that gives some of the technical changes DeepSeek implemented:

    How has DeepSeek improved the Transformer architecture?

    18 votes
  3. [4]
    llehsadam
    Link
    I have a feeling we will have more and more breakthroughs that cast doubt on the centralized AI narrative OpenAI and X are pushing. When it can be easily run on my M2 Macbook Air, I will give...

    I have a feeling we will have more and more breakthroughs that cast doubt on the centralized AI narrative OpenAI and X are pushing.

    When it can be easily run on my M2 Macbook Air, I will give whatever local model is best with open source software a try at doing tasks for me. We seem to be approaching that point this year.

    Weirdly enough, the Apple AI stuff everyone is crapping on is nice in the background. All in all the best experience compared to Copilot. I haven't had it summarize messages but that seems a little pointless in my experience. But I've been sorting 30 years of digital photos recently and it's nice to have it recognize objects, people and places, no effort from me required. It's not too invasive and works.

    All in all, exciting news.

    20 votes
    1. [2]
      Raspcoffee
      Link Parent
      It's going to be interesting when we see applications and improvements not necessarily correlated directly with investment. Also, I hope this'll start a trend as well given how energy expensive...

      It's going to be interesting when we see applications and improvements not necessarily correlated directly with investment. Also, I hope this'll start a trend as well given how energy expensive training LLMs are. If it's open source too, awesome.

      Also, looking at the article @krellor shared in this thread, it feels a lot like the development of electronics in a way. We're still figuring out tricks on how to make the machinery do what we want it to do, and coming up with unexpected ways to improve something else.

      It's funny to see investors being surprised by this - given how it's nothing new with cutting edge development to get unexpected successes and failures.

      9 votes
      1. krellor
        Link Parent
        Right, this doesn't feel much different than prior technology advances where a new capability is introduced followed by optimizations once limits were hit. The clever application of linear algebra...

        Right, this doesn't feel much different than prior technology advances where a new capability is introduced followed by optimizations once limits were hit. The clever application of linear algebra here could be an isolated set of improvements, or one of many clever optimizations as theorists and practitioners develop better tools to work with the maths of transformer networks.

        9 votes
    2. Minori
      Link Parent
      You should be able to run a distilled (smaller) version of DeepSeek's latest model on your laptop right now. I've seen a lot of people talk about successfully running a mini version locally.

      You should be able to run a distilled (smaller) version of DeepSeek's latest model on your laptop right now. I've seen a lot of people talk about successfully running a mini version locally.

      4 votes
  4. [9]
    Comment deleted by author
    Link
    1. Crestwave
      Link Parent
      Well even if it was a data harvesting scam site, at least it's open source and you can run the models locally for privacy unlike OpenAI's models.

      Well even if it was a data harvesting scam site, at least it's open source and you can run the models locally for privacy unlike OpenAI's models.

      17 votes
    2. [2]
      spidicaballero
      Link Parent
      why do people have this kind of actitude towards chinese products and not against US products, that are worse or the same? I created an account a few days ago without issues. Probably what you are...

      why do people have this kind of actitude towards chinese products and not against US products, that are worse or the same? I created an account a few days ago without issues. Probably what you are experiencing is related to this

      11 votes
      1. nukeman
        Link Parent
        Combination of nationalism and those who remember when Chinese products were junk. I say to people that the Chinese can build to spec, including to a high quality, but 90% of Western companies...

        Combination of nationalism and those who remember when Chinese products were junk. I say to people that the Chinese can build to spec, including to a high quality, but 90% of Western companies give a spec of “cheap” and that’s it.

        11 votes
    3. [2]
      ZeroGee
      Link Parent
      I also tried to register with email from my own domain, and I was told it's not good enough. Good indication that they're harvesting information as much as they're doing anything else.

      I also tried to register with email from my own domain, and I was told it's not good enough.

      Good indication that they're harvesting information as much as they're doing anything else.

      8 votes
      1. [2]
        Comment deleted by author
        Link Parent
        1. HeroesJourneyMadness
          Link Parent
          Last paragraph of the article - I’m guessing was an addendum - says they’ve experienced a brief outage on Jan 27 (today here in the US still), so I’m assuming they’ve got some scaling issues....

          Last paragraph of the article - I’m guessing was an addendum - says they’ve experienced a brief outage on Jan 27 (today here in the US still), so I’m assuming they’ve got some scaling issues.

          DeepSeek editorial hit my Insta reels last night for the first time too, which makes me think there’s some virality ATM.

          8 votes
    4. Noox
      Link Parent
      As others said it's a login issue, my colleagues were sharing screenshots across from stuff they'd asked it earlier today (like 7.5 hours agoish). But now another colleague said they too had...

      As others said it's a login issue, my colleagues were sharing screenshots across from stuff they'd asked it earlier today (like 7.5 hours agoish). But now another colleague said they too had issues registering so I think it's just been overloaded.

      I still don't trust it at ALL to be frank, but that's because of other reasons, not because it's a scam or something hah

      7 votes
    5. creesch
      Link Parent
      You are probably right about data harvesting, though they get much more from you using the thing. What is interesting to me is that your experience might as well could be about the chatGPT...

      You are probably right about data harvesting, though they get much more from you using the thing. What is interesting to me is that your experience might as well could be about the chatGPT registration experience or the one for Claude. Both of these I have had very buggy, outright broken experiences that keep on happening. For all the money and energy they spent on their models, they seem to skimp out on the front-end experience.

      6 votes
    6. Dr_Amazing
      Link Parent
      Worked for me. I tried it a little and I'd say the results were pretty comparable to what chat gpt gives me

      Worked for me. I tried it a little and I'd say the results were pretty comparable to what chat gpt gives me

      4 votes
  5. [2]
    ButteredToast
    (edited )
    Link
    It’s felt like not only US LLM companies but also Nvidia have been operating in “brute force” mode to produce advancements for a while now. Don’t think, just pump more GPU power/electricity! With...

    It’s felt like not only US LLM companies but also Nvidia have been operating in “brute force” mode to produce advancements for a while now. Don’t think, just pump more GPU power/electricity!

    With that in mind, it’s no surprise that a serious focus on efficiency is producing better results than cranking up the knob up ad infinitum. I hope this serves as a wake up call for Nvidia so future card models don’t suck down as much electricity as multiple whole computers. Maybe it won’t and we’ll have a replay of efficient cheap Japanese cars obliterating the obstinately gas-guzzling US competition in the GPU space, which at this point I’d honestly welcome (Nvidia needs to be knocked down a peg or two and Intel and AMD aren’t doing that).

    I’ll have to see how Deepseek-R1 runs on my M1 Max MacBook. I wonder if its 64GB unified memory is enough for this model to perform well.

    16 votes
    1. Weldawadyathink
      Link Parent
      I haven’t used it extensively, but my M3 Max with 64 gb ram seemed to handle the 70b model just fine. I would imagine you would be fine up to 70b as well.

      I haven’t used it extensively, but my M3 Max with 64 gb ram seemed to handle the 70b model just fine. I would imagine you would be fine up to 70b as well.

      5 votes
  6. Wulfsta
    Link
    I recall a leaked memo along the lines of “Google has no moat,” taking about the advances that the open source community were making towards individually accessible AI models. This feels similar -...

    I recall a leaked memo along the lines of “Google has no moat,” taking about the advances that the open source community were making towards individually accessible AI models. This feels similar - the technology is still so new that there is a ton of low hanging fruit left in the field, mostly limited by funding discovery of new techniques.

    13 votes
  7. Weldawadyathink
    Link
    Anecdotally DeepSeek Coder V1 was the absolute best model for me for anything programming related. I haven't used Coder V2 as much, but it seems quite similar to V1. I have DeepSeek R1 downloaded,...

    Anecdotally DeepSeek Coder V1 was the absolute best model for me for anything programming related. I haven't used Coder V2 as much, but it seems quite similar to V1. I have DeepSeek R1 downloaded, but I haven't used it much, and can't compare it to other chain of reasoning models.

    9 votes
  8. [2]
    shinigami
    Link
    This has peaked my interest just from the sense of it being Open-Source. I'm not a code monkey, I don't know enough about it, but I don't have an NVIDEA card, so I've always been a bit hamstrung...

    This has peaked my interest just from the sense of it being Open-Source.

    I'm not a code monkey, I don't know enough about it, but I don't have an NVIDEA card, so I've always been a bit hamstrung in digging deep. If there are options for me to run this, I am beyond excited.

    6 votes
    1. stu2b50
      Link Parent
      I don't think you're going to be able to run the full DeepSeek model's on consumer hardware, but you've been able to run LLMs in general on consumer hardware for ages. Usually variants or cut down...

      I don't think you're going to be able to run the full DeepSeek model's on consumer hardware, but you've been able to run LLMs in general on consumer hardware for ages. Usually variants or cut down versions of Llama.

      6 votes
  9. pete_the_paper_boat
    Link
    I understand why it's a shock to Nvidia, but it really shouldn't. Okay, so, that AI doesn't need neverending amounts of compute. But, what happens when you throw neverending amounts of compute at...

    I understand why it's a shock to Nvidia, but it really shouldn't.

    Okay, so, that AI doesn't need neverending amounts of compute. But, what happens when you throw neverending amounts of compute at it?

    Now that we built all that compute, may as well put it all to use!

    3 votes