6 votes

EVGA confirms it's replacing all its RTX 3090s killed by Amazon's New World MMO

15 comments

  1. [14]
    Bullmaestro
    Link
    This is reminding me a whole lot of the Gurren Lagann MMO that Bandai Namco were working on years ago. Game got cancelled in closed beta because it was outright bricking hard drives. I mean at...

    This is reminding me a whole lot of the Gurren Lagann MMO that Bandai Namco were working on years ago.

    Game got cancelled in closed beta because it was outright bricking hard drives. I mean at least HDDs are cheap to replace. $300 to $1,000 graphics cards being fried by a shoddily-built MMORPG in the midst of a semiconductor shortage is arguably more devastating than that.

    6 votes
    1. [11]
      stu2b50
      Link Parent
      I wouldn't blame the game in this case. There should be no load you can run on a modern GPU that causes it to brick itself. That just shouldn't be possible. Of course, that's an ideal - there are...

      I wouldn't blame the game in this case. There should be no load you can run on a modern GPU that causes it to brick itself. That just shouldn't be possible.

      Of course, that's an ideal - there are edge cases and exploits in real life, but certainly no genuine, non-malicious loads should be anywhere near bricking a modern GPU.

      Doubly so for one that runs perfectly fine on other GPUs from other suppliers.

      Whether or not it runs inefficiently is a sin Amazon can have in this case, but the blame for the 3090 is entirely on EVGA.

      22 votes
      1. AugustusFerdinand
        Link Parent
        That this card has been out less than a year (if I remember correctly) shows that this isn't some act of charity, it's just standard warranty coverage.

        That this card has been out less than a year (if I remember correctly) shows that this isn't some act of charity, it's just standard warranty coverage.

        12 votes
      2. [9]
        babypuncher
        Link Parent
        In this case the card was actively protecting itself. It would shut down automatically when encountering the problem. Where the bricking came from was users turning their system back on and...

        In this case the card was actively protecting itself. It would shut down automatically when encountering the problem. Where the bricking came from was users turning their system back on and re-trying the exact same behavior that caused the crash repeatedly, eventually resulting in damage to the hardware.

        3 votes
        1. [8]
          Octofox
          Link Parent
          It should still never be possible to damage the hardware via software and any damage is a hardware/firmware fault. Ideally not even the driver should be able to damage the hardware irrecoverably.

          It should still never be possible to damage the hardware via software and any damage is a hardware/firmware fault. Ideally not even the driver should be able to damage the hardware irrecoverably.

          5 votes
          1. [7]
            babypuncher
            Link Parent
            It gets tricky when you are dealing with the device being power cycled between failure events, it doesn't have a way of knowing it just suffered a failure and to "be more careful". The hardware is...

            It gets tricky when you are dealing with the device being power cycled between failure events, it doesn't have a way of knowing it just suffered a failure and to "be more careful".

            The hardware is at fault here but I'm not going to say that the manufacturers fucked up so much as this is just an unfortunate usage scenario that has probably never been tested for in GPUs.

            2 votes
            1. [6]
              Greg
              Link Parent
              Any thoughts on what kind of thing might cause cumulative damage over several failures? I'd been assuming it was somehow heat related, but that would probably just be an absolute cut-off...

              Any thoughts on what kind of thing might cause cumulative damage over several failures? I'd been assuming it was somehow heat related, but that would probably just be an absolute cut-off regardless of prior state.

              1 vote
              1. [4]
                Octofox
                Link Parent
                I doubt anyone can offer a real answer or anything closer than a vague guess. But there are two layers that should have kicked in. The firmware should have not allowed the chip to run faster than...

                I doubt anyone can offer a real answer or anything closer than a vague guess. But there are two layers that should have kicked in. The firmware should have not allowed the chip to run faster than it could handle thermally. Either by setting the max clock speed at a safe level, or by dynamically adjusting the clock. And then the second safety is that if temps are too high somehow (Fan failure or something) the card cuts off before there is damage.

                Under no situations would it ever be the games fault. The game is a "user" of the driver interface and that interface should be safe. Yes obviously mistakes happen, but its a fault in the driver and not the game.

                The article tries to suggest the game is at fault which is like having a user that clicked the wrong button on a website and it deleted the database and then going "Well you are the only user who clicked that button so its your fault"

                2 votes
                1. [3]
                  babypuncher
                  Link Parent
                  But the cards did do both of these things, otherwise they wouldn't have shut down in the first place.

                  The firmware should have not allowed the chip to run faster than it could handle thermally. Either by setting the max clock speed at a safe level, or by dynamically adjusting the clock. And then the second safety is that if temps are too high somehow (Fan failure or something) the card cuts off before there is damage.

                  But the cards did do both of these things, otherwise they wouldn't have shut down in the first place.

                  1 vote
                  1. [2]
                    Octofox
                    Link Parent
                    Shutting down means one safety failed already. The card eventually failing entirely is the second safety failing. Under no normal circumstances is it ever possible for a game to get past either of...

                    Shutting down means one safety failed already. The card eventually failing entirely is the second safety failing.

                    Under no normal circumstances is it ever possible for a game to get past either of these. Obviously there are issues with the cards firmware where it allowed itself to be damaged.

                    1. babypuncher
                      Link Parent
                      I disagree on the first. I don't think it is possible for firmware to maintain perfect control over component temperature no matter how hard you try. Most hardware has a safety failure states that...

                      I disagree on the first. I don't think it is possible for firmware to maintain perfect control over component temperature no matter how hard you try. Most hardware has a safety failure states that can be triggered with improper usage.

              2. babypuncher
                Link Parent
                Heat buildup in components that do not have their own temperature sensors could be one.

                Heat buildup in components that do not have their own temperature sensors could be one.

                1 vote
    2. MeckiSpaghetti
      Link Parent
      I have the feeling that the outrage would be even greater when it would be about hard drives. Many people still don’t make backups and would moan about having lost important data. So I think,...

      I have the feeling that the outrage would be even greater when it would be about hard drives. Many people still don’t make backups and would moan about having lost important data. So I think, replacing the hard drive and then having to recover the data would be more effort than to just replace the graphics card.

      3 votes
    3. babypuncher
      Link Parent
      I can't imagine outright cancelling an entire MMO purely because you can't solve a bug that is causing hard drive problems. Does not speak well to their technical skills.

      I can't imagine outright cancelling an entire MMO purely because you can't solve a bug that is causing hard drive problems.

      Does not speak well to their technical skills.

      2 votes
  2. Octofox
    Link
    Company agrees to replace defective product still in warranty. Of course they are, they legally have to be. It would be more interesting news if they could explain why they failed.

    Company agrees to replace defective product still in warranty. Of course they are, they legally have to be.

    It would be more interesting news if they could explain why they failed.

    5 votes