11 votes

“C is not how the computer works” can lead to inefficient code

5 comments

  1. [5]
    skybrian
    Link
    There is a fundamental issue here, which is that, when it comes to "correctness" (what output do you get), we have specifications, proofs, abstraction, contracts, and portability. We might not...

    There is a fundamental issue here, which is that, when it comes to "correctness" (what output do you get), we have specifications, proofs, abstraction, contracts, and portability. We might not actually prove things formally, but at least the results can be well specified.

    When it comes to performance, all bets are off. The only way to find out how software performs in production is to run it in production, or sometimes on exact replica hardware. (Sometimes you can do fuzzy reasoning by analogy or ignore constant factors to do algorithm analysis.)

    This divide is helpful in many ways. Being flexible about performance is what makes portability and optimization possible. No software would be portable and progress would come to a halt if libraries needed to guarantee that new versions of their functions never got any slower.

    But there's another sense in which our usual notion of "correctness" is a myth and any change has a nonzero chance of triggering a critical performance slowdown that breaks things for some user far away, and the only thing to do is notice problems and fix them.

    1 vote
    1. [4]
      alexandria
      Link Parent
      That's not entirely the case. While it is obviously true for cache-based optimization, it is not true for branch-prediction based optimization. That is to say, simply optimizing with the goal of...

      When it comes to performance, all bets are off. The only way to find out how software performs in production is to run it in production, or sometimes on exact replica hardware. (Sometimes you can do fuzzy reasoning by analogy or ignore constant factors to do algorithm analysis.)

      That's not entirely the case. While it is obviously true for cache-based optimization, it is not true for branch-prediction based optimization. That is to say, simply optimizing with the goal of reducing the number of branches in a program (and thus the number of 'oopsies' that the branch predictor goes through) has a consistent, measurable impact on performance.

      You can see that here: https://nothings.org/computer/lexing.html

      I also wrote a tiny wc clone (warning: code quality isn't great) (before all the recent 'faster wc clones' hit the media) using the strategy outlined above and a friend measured it and I achieved about 20x the performance of normal wc.

      3 votes
      1. [3]
        skybrian
        Link Parent
        Yes, good point, I exaggerated a bit. There are lots of great tools for making your code faster. But these gains can be fragile in the sense that it's within spec for someone somewhere else in the...

        Yes, good point, I exaggerated a bit. There are lots of great tools for making your code faster.

        But these gains can be fragile in the sense that it's within spec for someone somewhere else in the stack (hardware or software) to make some other change that invalidates your assumptions and undoes your work. We don't have the sort of mathematical guarantees that are possible (in principle, anyway) with correctness. You just kind of hope that compilers don't change their code generation too much and processors only get better at branch prediction, or you have enough of a buffer that a slowdown doesn't matter too much. This hope often seems justified by experience, but on the other hand, when there's a performance regression it's not really all that surprising.

        (And command-line tools don't make any performance guarantees anyway. Hardly anyone does.)

        1 vote
        1. [2]
          alexandria
          Link Parent
          Well, no. The code I linked to is independent of processors getting better at branch predicting, because it has none. This code will work efficiently on non-branch predicting computers, and branch...

          You just kind of hope that compilers don't change their code generation too much and processors only get better at branch prediction, or you have enough of a buffer that a slowdown doesn't matter too much.

          Well, no. The code I linked to is independent of processors getting better at branch predicting, because it has none. This code will work efficiently on non-branch predicting computers, and branch predicting ones, but it has an advantage on the latter. Obviously it trades off for cache-locality (using a lookup table), but generally most computers these days have quite large caches, at least for this specific application -- so in that sense you are right.

          2 votes
          1. skybrian
            Link Parent
            What I mean is, if your goal is to write fast code this is fine. If your goal is to meet a deadline, you don't know how much leeway you need to be sure you'll continue to meet it. Or even if it...

            What I mean is, if your goal is to write fast code this is fine. If your goal is to meet a deadline, you don't know how much leeway you need to be sure you'll continue to meet it. Or even if it does meet it on someone else's machine. Nothing is specified well enough to know for sure.

            I guess that's just another way of saying we're not doing hard-realtime programming. Not news, but I find the contrast striking.

            1 vote