8 votes

Data Locality - Accelerate memory access by arranging data to take advantage of CPU caching.

2 comments

  1. Soptik
    Link
    The website is lightweight, no 3rd party scripts (apart from google analytics) and no ads. But if you really want, here is outline. Afterwards the article explains why and how. Author writes about...

    The website is lightweight, no 3rd party scripts (apart from google analytics) and no ads. But if you really want, here is outline.


    We’ve been lied to. They keep showing us charts where CPU speed goes up and up every year as if Moore’s Law isn’t just a historical observation but some kind of divine right. Without lifting a finger, we software folks watch our programs magically accelerate just by virtue of new hardware.

    Chips have been getting faster (though even that’s plateauing now), but the hardware heads failed to mention something. Sure, we can process data faster than ever, but we can’t get that data faster.

    With today’s hardware, it can take hundreds of cycles to fetch a byte of data from RAM. If most instructions need data, and it takes hundreds of cycles to get it, how is it that our CPUs aren’t sitting idle 99% of the time waiting for data?

    Actually, they are stuck waiting on memory an astonishingly large fraction of time these days, but it’s not as bad as it could be. To explain how, let’s take a trip to the Land of Overly Long Analogies…

    Afterwards the article explains why and how. Author writes about his own experience and includes examples.

    When I started working on this chapter, I spent some time putting together little game-like programs that would trigger best case and worst case cache usage. I wanted benchmarks that would thrash the cache so I could see first-hand how much bloodshed it causes.

    When I got some stuff working, I was surprised. I knew it was a big deal, but there’s nothing quite like seeing it with your own eyes. I wrote two programs that did the exact same computation. The only difference was how many cache misses they caused. The slow one was fifty times slower than the other.

    I got interested in this after discussion with @Emerald_Knight at one programming challenge (merging and ordering arrays).

    2 votes
  2. Emerald_Knight
    Link
    I haven't read through the entire thing--maybe 1/3 to 1/2 of it--but it seems like an excellent article with simple and digestible analogies and practical examples. Nice find!

    I haven't read through the entire thing--maybe 1/3 to 1/2 of it--but it seems like an excellent article with simple and digestible analogies and practical examples.

    Nice find!

    1 vote