31 votes

Nvidia RTX 50 graphics card family TDPs 'leaked' by Seasonic

32 comments

  1. [32]
    ButteredToast
    Link
    500W seems insane, even for a top-end card. There has to be an upper limit somewhere, right? If nothing else, at some point it becomes too much of a stretch to put such high-draw big iron into the...

    500W seems insane, even for a top-end card. There has to be an upper limit somewhere, right? If nothing else, at some point it becomes too much of a stretch to put such high-draw big iron into the same product line as "bare minimum" models like the 4050/5050. It's like putting a Threadripper or Epyc CPU in the same product line as a Ryzen 5100/5500.

    15 votes
    1. [11]
      Jambo
      Link Parent
      The problem they are facing and the reason this is happening is they have to produce a stronger card than last year to keep investors happy, but they're hitting the limits of what silicon chips...

      The problem they are facing and the reason this is happening is they have to produce a stronger card than last year to keep investors happy, but they're hitting the limits of what silicon chips can accomplish without pushing more power through them. This issue is exacerbated by the high heat output as well, there are diminishing returns on how much "bang for the watt" we can achieve now that we have pushed these chips this far and transistors this small.

      This is also why they have been heavily invested in software for things like DLSS, they are somewhat cooked on hardware so they're trying to find things they can do to the resultant image to reduce processing cost without the user seeing or feeling it (in quality or latency).

      I'm interested to see if they will eventually find a way to reduce their instruction set (like apple did when they went away from traditional CISC processors) to reduce the amount of work a gpu has to do without interfering with the capabilities of the card. I'm also interested to see what happens after silicon, like graphene. I'm sure whatever it is will be quite pricey for us consumers for a good long while.

      20 votes
      1. [2]
        teaearlgraycold
        Link Parent
        Their biggest datacenter cards consume 700W each. It's clear they're in a mode of desperation.

        Their biggest datacenter cards consume 700W each. It's clear they're in a mode of desperation.

        7 votes
        1. cutmetal
          Link Parent
          Well, but the data center GPUs are a completely different beast, those aren't "cards." Even if Nvidia tightened up their gamer GPU TDP situation, they would still design the next generation of...

          Well, but the data center GPUs are a completely different beast, those aren't "cards." Even if Nvidia tightened up their gamer GPU TDP situation, they would still design the next generation of A100/H100 (or whatever it's called) to maximize performance at the same or similar draw and footprint. Customer calculus is inference per watt per installed square foot.

          8 votes
      2. ButteredToast
        Link Parent
        Also mimicking Apple, they could always take the direction of piling on more silicon, which is expensive but at least helps keep power usage and thermal issues at bay. I'm sure there's a customer...

        Also mimicking Apple, they could always take the direction of piling on more silicon, which is expensive but at least helps keep power usage and thermal issues at bay. I'm sure there's a customer segment that would happily pay a premium for cards that are as powerful as current top-end cards, but don't have the crazy PSU and cooling requirements.

        4 votes
      3. [7]
        Minori
        (edited )
        Link Parent
        Could you provide any more info on this? Last I remembered this was a talking point before the Pascal series were introduced which brought massive efficiency gains. I haven't heard much about TSMC...

        but they're hitting the limits of what silicon chips can accomplish without pushing more power through them.

        Could you provide any more info on this? Last I remembered this was a talking point before the Pascal series were introduced which brought massive efficiency gains. I haven't heard much about TSMC peaking on silicon density. I know we're getting sorta close which is why companies are investigating 3D designs (though the related thermal issues may be insurmountable).

        Edit: silicon, not silicone

        3 votes
        1. [6]
          vord
          (edited )
          Link Parent
          I found this video about the Apple Silicon a good primer on the topic. Speed enhancements almost invariably come in these ways, with the following tradeoffs: Increase clock speeds, generate more...

          I found this video about the Apple Silicon a good primer on the topic.

          Speed enhancements almost invariably come in these ways, with the following tradeoffs:

          • Increase clock speeds, generate more heat per transistor. We're at practicle maximums here around 5GHz, and have been since 2006ish.
          • Add more transistors by increasing die size. Adds more heat, consumes more power.
          • Shrink die transistor size with process improvement to reduce power consumption. This lets you add more transistors without increasing die size, however it means more heat in smaller areas which needs to dissipate faster. We're also extremely close to (or at) theoretical/practical limits here.
          • Software improvements, via better algorithms and more efficient coding...this is mostly ignored in favor of ease of development. Also inventing new algorithms is very difficult.
          • Dedicated, not general purpose, hardware designed to optimize the software algorithms.

          Apple had their breakthrough with the M1 primarily because of being able to push that last one due to a tight ecosystem. However, since doing so, they've been 'stuck with' the same problems as the rest of the silicon industry and other players are catching up.

          So yea, NVIDIA is essentially at the point where their primary dials are:

          • Increase power consumption
          • Improve software

          And one of those dials is much, much, much easier to turn. We're not long before we'll need dedicated 20A breakers for top-end PCs, or perhaps a transition to 240V.

          6 votes
          1. [2]
            ButteredToast
            Link Parent
            Worth noting that with the M-series, compared to traditional x86 CPUs and their integrated GPUs, Apple has also taken option #2 to a significant degree. Instead of brute forcing performance by...

            Worth noting that with the M-series, compared to traditional x86 CPUs and their integrated GPUs, Apple has also taken option #2 to a significant degree. Instead of brute forcing performance by pumping more power, they've made their SoCs huge with a crazy number of transistors.

            It's one of the reasons why Intel, AMD, and now even Qualcomm are struggling to match the M-series' performance per watt (Intel/AMD on the low end, Qualcomm on the high end): they simply can't afford to put as many transistors into their CPUs since they're mass-market offerings that can't have costs absorbed by a high-margin product built with them.

            4 votes
            1. vord
              (edited )
              Link Parent
              As the video gets to at one point, the M3 is already a step backwards from the M2 in many ways, hitting hard thermal throttling with any sort of sustained load. They managed to get a jumpstart...

              As the video gets to at one point, the M3 is already a step backwards from the M2 in many ways, hitting hard thermal throttling with any sort of sustained load.

              They managed to get a jumpstart there...but as the competition catches up they're kinda limited the same as everyone else, and there's not really much room to expand their die any further or sacrifice their performance/watt.

              Their 'magic leap' in that performance/watt was a mostly one-time affair. We're not going to be seeing the generational die shrinks that we saw in the previous 3 decades that let them keep ahead of their competition YoY.

              2 votes
          2. [3]
            Minori
            (edited )
            Link Parent
            Thanks for the reference! I did some additional research as well, and it seems like the current silicon limit is FinFET transistors have basically hit maximum density. TSMC's next goal is...

            Thanks for the reference! I did some additional research as well, and it seems like the current silicon limit is FinFET transistors have basically hit maximum density. TSMC's next goal is gate-all-around (GAAFET) transistors, but those are a few years away (assuming they pan out).

            Edit: fixing autocorrect's mistakes

            2 votes
            1. [2]
              Omnicrola
              Link Parent
              Since you spelled it this way twice, I'll assume it isn't a typo and offer a friendly FYI : "silicone" is a different substance than "silicon". One is good for making transistors, and the other is...

              the current silicone limit

              Since you spelled it this way twice, I'll assume it isn't a typo and offer a friendly FYI : "silicone" is a different substance than "silicon". One is good for making transistors, and the other is good source of double entendres.

              4 votes
              1. Minori
                Link Parent
                Thanks for pointing that out. I'm aware of the difference but consistently mess up the spelling despite having worked with both!

                Thanks for pointing that out. I'm aware of the difference but consistently mess up the spelling despite having worked with both!

                3 votes
    2. [2]
      teaearlgraycold
      Link Parent
      The power connectors support up to 600W. So unless they put two of those on one card that's the limit.

      The power connectors support up to 600W. So unless they put two of those on one card that's the limit.

      8 votes
      1. Minori
        Link Parent
        Quad power connectors isn't unheard of for GPUs, so I wouldn't be surprised at all if they went this route.

        Quad power connectors isn't unheard of for GPUs, so I wouldn't be surprised at all if they went this route.

        2 votes
    3. [17]
      tinfoil
      Link Parent
      Even when the 40 series came out I think a lot of people were shocked. A lot of people mentioned that a system with a 4090 and high end CPU to match would draw enough power that you may seriously...

      Even when the 40 series came out I think a lot of people were shocked. A lot of people mentioned that a system with a 4090 and high end CPU to match would draw enough power that you may seriously have to watch what else you put on that breaker. For older houses you might need to shut stuff off before firing up a game.

      8 votes
      1. [9]
        teaearlgraycold
        Link Parent
        At work we needed to give our AI workstation its own 20A circuit. We put the largest available 120V power supply in it, 1650W, to power 3x L40S cards and a 7950X3D.

        At work we needed to give our AI workstation its own 20A circuit. We put the largest available 120V power supply in it, 1650W, to power 3x L40S cards and a 7950X3D.

        7 votes
        1. [8]
          Greg
          Link Parent
          Bit of a tangent, but have you guys noticed any issues with PCIe bottlenecks on that setup? I’m speccing a similar machine and it’d be great to stick with AM5 rather than Threadripper if possible...

          Bit of a tangent, but have you guys noticed any issues with PCIe bottlenecks on that setup?

          I’m speccing a similar machine and it’d be great to stick with AM5 rather than Threadripper if possible (aside from the big cost advantage, there’s always something that ends up benefitting from extra single-core performance), but I’m having trouble finding info on how much difference running the cards at x8/x8/x4 makes in reality.

          1 vote
          1. [7]
            teaearlgraycold
            Link Parent
            For our needs there’s no bottleneck. The cards hover around 100MiB/s in bandwidth during training, well below the capacity of the slowest slot. Even if there is a bottleneck for a moment it just...

            For our needs there’s no bottleneck. The cards hover around 100MiB/s in bandwidth during training, well below the capacity of the slowest slot. Even if there is a bottleneck for a moment it just delays the run a few minutes, which is inconsequential across a day long run.

            1 vote
            1. [6]
              Greg
              Link Parent
              Very good to know, thank you! I’m guessing no FSDP or similar in your scenario, in that case? If it’s just a case of ruling out the fancier kinds of parallelism that could definitely work for us.

              Very good to know, thank you! I’m guessing no FSDP or similar in your scenario, in that case? If it’s just a case of ruling out the fancier kinds of parallelism that could definitely work for us.

              1 vote
              1. [5]
                teaearlgraycold
                Link Parent
                I'm not one of the ML guys so I'm not familiar enough with terms like FSDP to say if that's in use. We do train across all GPUs sometimes. We also will run different workloads on each card. It's...

                I'm not one of the ML guys so I'm not familiar enough with terms like FSDP to say if that's in use. We do train across all GPUs sometimes. We also will run different workloads on each card. It's on a machine that the 3 ML guys all SSH into.

                We actually swapped over to A6000 cards recently and moved our 3 L40S cards into our new cluster which totals 16 L40S GPUs now.

                1 vote
                1. [4]
                  Greg
                  Link Parent
                  Got you, and I really appreciate the info - it sounds like pretty much exactly the use case I’m looking at: enough to run a few different dev jobs in parallel, or one somewhat larger job more...

                  Got you, and I really appreciate the info - it sounds like pretty much exactly the use case I’m looking at: enough to run a few different dev jobs in parallel, or one somewhat larger job more quickly, with anything significantly bigger going off to the actual servers. Probably means that worrying too much about bus bandwidth would be overkill, which is very much what I was hoping to hear.

                  1 vote
                  1. [3]
                    teaearlgraycold
                    Link Parent
                    It was so much cheaper this way than going with any of the commercial options. For example, Lambda Labs sells a 3x A6000 machine for $44,000! But it's got 96 cores and 512GB of RAM. Ours cost...

                    It was so much cheaper this way than going with any of the commercial options. For example, Lambda Labs sells a 3x A6000 machine for $44,000! But it's got 96 cores and 512GB of RAM. Ours cost closer to $15,500.

                    1 vote
                    1. [2]
                      Greg
                      Link Parent
                      The difference really is mind blowing sometimes! As soon as you edge into “there’s some level of budget justified for this” it’s like the suppliers hear “money no object, add a nice big multiplier”.

                      The difference really is mind blowing sometimes! As soon as you edge into “there’s some level of budget justified for this” it’s like the suppliers hear “money no object, add a nice big multiplier”.

                      1 vote
                      1. teaearlgraycold
                        Link Parent
                        Actually we just figured out we have the old cards, the A6000 instead of the RTX 6000 Ada. We were on a call with an Nvidia rep and learned that they have 3 very similarly named cards: RTX 6000...

                        Actually we just figured out we have the old cards, the A6000 instead of the RTX 6000 Ada. We were on a call with an Nvidia rep and learned that they have 3 very similarly named cards:

                        • RTX 6000 (very old)
                        • RTX A6000 (old)
                        • RTX 6000 Ada (new)

                        I only knew of the first two. The newest one has the same appearance and amount of VRAM as the middle one. Super confusing, but the A6000 does seem to be a better bang for our buck right now.

                        1 vote
      2. [7]
        Gummy
        Link Parent
        I actually ran into this issue in my apartment. The breaker gets tripped if I don't turn off the AC unit before doing anything demanding on the gpu. Not a huge deal since I live far enough north...

        I actually ran into this issue in my apartment. The breaker gets tripped if I don't turn off the AC unit before doing anything demanding on the gpu. Not a huge deal since I live far enough north that the summers are 80°~ but it's still weird that my gpu can draw such a monumental load.

        5 votes
        1. [6]
          JakeTheDog
          Link Parent
          Try undervolting your card with e.g. Afterburner. I was able to reduce power consumption by about 25-30% on my 4070 Super (from 1.1V to 0.970V) with negligible impact on frame rate. I even did the...

          Try undervolting your card with e.g. Afterburner. I was able to reduce power consumption by about 25-30% on my 4070 Super (from 1.1V to 0.970V) with negligible impact on frame rate. I even did the same on my Ryzen 7600 (-50mv and 80W power limit) for another 25% power consumption cut and actually gained performance because it was no longer thermally throttling and stays boosted indefinitely.

          The added benefit is a much cooler (15C cooler CPU and 5-10C cooler GPU under max load) and quieter rig.

          Undervolt is the new overclock.

          9 votes
          1. [5]
            JCPhoenix
            Link Parent
            Do you happen to have any good guides, particularly for beginners, for undervolting? I have a 3080 and a R7 5800X3D. I'm not so much concerned about performance, as I am power consumption/heat...

            Do you happen to have any good guides, particularly for beginners, for undervolting? I have a 3080 and a R7 5800X3D. I'm not so much concerned about performance, as I am power consumption/heat generation.

            2 votes
            1. JakeTheDog
              Link Parent
              Nothing directly*, and some nuances are chip-generation specific. I would read about overlocking first. But it’s pretty straightforward and not as complicated as it sounds if you just stick to the...

              Nothing directly*, and some nuances are chip-generation specific. I would read about overlocking first. But it’s pretty straightforward and not as complicated as it sounds if you just stick to the basics. The first thing would be to familiarize yourself with your BIOS, and make sure the firmware is updated. You don’t want to use any “indirect” OS software for tweaking your memory or CPU. For the GPU I recommend MSI Afterburner (OS software is fine for GPU tuning). There’s plenty of support online for every mainstream chip under the sun.

              The overall formula is the same: find a starting point that others found works for your chip, run a stress test for stability, and from there make minor adjustments. Tweak until failure and then back off a bit. Keep in mind every chip is a bit different in terms of stability i.e. the “silicon lottery”. In my case most people report -30 mV undervolt for the ryzen 7600 but I won the lottery with mine (I could push -60 mV but not stable under extended loads).

              *EDIT: actually overclock.net is an excellent forum for information and user’s experiences.

              2 votes
            2. [2]
              hungariantoast
              (edited )
              Link Parent
              Ryzen 5000 Undervolting with PBO2 – Absolutely Worth Doing Not sure about undervolting for Nvidia GPUs though. Note that after undervolting, you'll want to do some stability testing. For a CPU,...

              Ryzen 5000 Undervolting with PBO2 – Absolutely Worth Doing

              Not sure about undervolting for Nvidia GPUs though.

              Note that after undervolting, you'll want to do some stability testing. For a CPU, this would mean letting something like Cinebench run multi-threaded and single-threaded benchmarks on a loop for thirty minutes at least. For your GPU, an extended run of something like Furmark would be fine.

              You can't guarantee system stability just through benchmarking software though. If you're computer doesn't crash after testing with benchmarks, you'll just have to use it like you normally do. If your computers end up crashing while playing games or sitting idle, then try dialing back the undervolt and see if it happens again. Stability issues often occur in the weird valley between idle and 100% usage, and that's the most difficult area to test.

              People tend to recommend software like OCCT for stress and stability testing, but I have generally found it to be useless and prone to reporting errors when they don't actually exist. Your mileage may vary, but I certainly would not adjust an otherwise stable undervolt just because OCCT reports errors.

              1 vote
              1. JakeTheDog
                Link Parent
                I agree with everything you said but I would caution against Furmark. It's generally claimed to put an exceptionally unrealistic heavy load on the GPU and in some cases could cause damage....

                I agree with everything you said but I would caution against Furmark. It's generally claimed to put an exceptionally unrealistic heavy load on the GPU and in some cases could cause damage. Something like 3dmark is often sufficient if one just keeps an eye out for artifacts, and then actually play a game for an extended period. Afterburner is convenient enough that you can back off the tweak and get back into playing after a graphics crash pretty quick!

            3. Englerdy
              Link Parent
              For NVIDIA GPUs (I'm like 90% sure it works the same for AMD), MSI Afterburner (which works great even if your card isn't from MSI) is a good candidate for performance tweaking. You should pretty...

              For NVIDIA GPUs (I'm like 90% sure it works the same for AMD), MSI Afterburner (which works great even if your card isn't from MSI) is a good candidate for performance tweaking. You should pretty easily be able to apply voltage and clock adjustments within bounds pretty safe for your card. It's been a while since I went through the process, but when I last looked into it for my 1080 there were quite a few videos I found that walked through undervolting and then over clocking a GPU with Afterburner. It looked pretty card/manufacturer independent but I'm not in a spot where I can jump down a rabbit hole to get you some links. But you could watch a few videos and get a good sense of what to do.

              1 vote
    4. Greg
      Link Parent
      Given the 10x price gap for the A and L series, I’m more than happy for them to keep wedging the xx90 cards into the “consumer” lineup regardless of how much sense they might make there! I’d been...

      Given the 10x price gap for the A and L series, I’m more than happy for them to keep wedging the xx90 cards into the “consumer” lineup regardless of how much sense they might make there! I’d been hoping the Titan branding would come back with 36 or 48GB VRAM, but I guess they decided that’d too heavily cannibalise A6000 sales.

      7 votes