12 votes

Fixing Mass Effect black blobs on modern AMD CPUs

4 comments

  1. [3]
    joplin
    Link
    Wow! That's a pretty in-depth analysis! I've hit artifacts like this when the results of a fragment calculation ended up being a NaN. It would kill the entire fragment. If it only happened on a...

    Wow! That's a pretty in-depth analysis! I've hit artifacts like this when the results of a fragment calculation ended up being a NaN. It would kill the entire fragment. If it only happened on a few pixels you'd end up with little 2x2 or 4x4 black blocks scattered throughout your rendered image. You could tell you hit this particular problem because the alpha channel would also be either a NaN or some really huge number.

    4 votes
    1. [2]
      whbboyd
      Link Parent
      As a final project in high school I wrote a very basic 2D physics simulation. It worked well enough to get me a good grade in the class, but if you played around with it enough, periodically it...

      As a final project in high school I wrote a very basic 2D physics simulation. It worked well enough to get me a good grade in the class, but if you played around with it enough, periodically it would glitch out and fix every object to the top left of the window. After enough debugging, it turned out that if the centers of two objects got close enough to each other, the resulting expulsive force would overflow to infinity, and a cascade of failures would result in a NaN in one object's position (I apologize for the lack of details, but this was going on fifteen years ago), and the following frame, attempting to compute a collision with the object at NaN would apply a force of NaN to every other object and put them at NaN, as well. And then the renderer for some reason tried to draw them at the window origin. (I never figured that one out.)

      NaN is a mess in games and some simulations. It informs you of a bug you probably didn't care about in the most destructive way imaginable. I wish it were more universal to use signaling NaNs, because then at least you get pointed to the exact place the problem originated.

      7 votes
      1. joplin
        Link Parent
        Ha! Oh wow, that's good. Yeah, I've had my share of NaN problems, so I know what you mean. It's worse with video cards because for a long time they weren't all IEEE 754-compliant, so you'd test...

        Ha! Oh wow, that's good. Yeah, I've had my share of NaN problems, so I know what you mean. It's worse with video cards because for a long time they weren't all IEEE 754-compliant, so you'd test something on an AMD card and it would work fine. But then you'd take it to an Nvidia machine, and it would suddenly start glitching out. It's better these days, thankfully.

        2 votes
  2. whbboyd
    Link
    A thread on Hacker News digging into the root cause of the error in some detail. In summary: D3DX is most likely using a particular instruction (RCPSS/RCPPS) which returns approximate results—both...

    A thread on Hacker News digging into the root cause of the error in some detail. In summary: D3DX is most likely using a particular instruction (RCPSS/RCPPS) which returns approximate results—both Intel and AMD processors produce in-spec approximations, but the precise values differ.

    3 votes