29 votes

Reducing memory allocations from 7.5GB to 32KB

6 comments

  1. [5]
    Emerald_Knight
    Link
    This is why I get a bit bothered when people trash higher-level programming languages as being "inefficient" or "resource hogs" (I've seen enough circlejerks about this on reddit that I gave up on...

    This is why I get a bit bothered when people trash higher-level programming languages as being "inefficient" or "resource hogs" (I've seen enough circlejerks about this on reddit that I gave up on having any productive discussions about programming languages there). It's (usually) not a problem with the language, but with your understanding of its internal workings, your fundamental knowledge of resource-intensive constructs, and the problem space you're working with, and your ability to use all of that in order to squeeze out the performance you want/need for the problem you're trying to solve.

    In this case, parsing a 300MB file is a lot different than parsing a file no more than a few KB in size, so naturally you're going to want to alter your approach since your resource usage is going to increase substantially. A lot of programmers don't, however, and that's just a problem with the programmer, not a limitation with the language itself. This kind of stark contrast pre- and post-optimization really highlights the fact.

    I'm always a big advocate for viewing languages as different types of tools that are best suited for different problem spaces or that you really need to know what you're doing with a tool in order to use it effectively. Never use a rubber mallet to try to break down a brick wall, and never use a sledgehammer to drive a small nail through some drywall. That being said, you can absolutely use a flat-head screwdriver on a cross-slotted screw if you use the right size and you know the material strengths of both the screwdriver and the screw won't cause stripping of either.

    11 votes
    1. [2]
      tan
      Link Parent
      I get what you're saying, and logical inefficiency is certainly a major understated cause of slowness (an n squared algorithm in C won't beat a logn algorithm in Perl) but I think there are...

      I get what you're saying, and logical inefficiency is certainly a major understated cause of slowness (an n squared algorithm in C won't beat a logn algorithm in Perl) but I think there are several other factors that affect if the language will really be slower. For example, needing a runtime or being garbage collected. And of course there's the constant overhead people are usually talking about with 'slow' languages of everything being dynamic and needing to be looked up and/or interpreted.

      It also depends on the task - for pure number crunching, probably better not to use pure Python but rather NumPy or FORTRAN or Rust or whatever.

      That said, it does annoy me a lot when people laugh at the idea of improving a Python program's performance because 'lol python is just slow' because there's so much more to it than that.

      4 votes
      1. Emerald_Knight
        Link Parent
        Definitely. There's certainly more nuance to it, hence I made the tool analogy. If you need to squeeze every last bit of performance out, then naturally you're going to want a lower level language...

        Definitely. There's certainly more nuance to it, hence I made the tool analogy. If you need to squeeze every last bit of performance out, then naturally you're going to want a lower level language where you can fine-tune things like memory management and accesses and try to get every CPU cycle to count. Barring those kinds of performance-heavy problems, however, your bottlenecks will end up being your own knowledge and ability, not the language.

    2. [2]
      Duchess
      Link Parent
      Do you have any place where someone can read further into this? I can definitely see myself not changing my approach when trying to parse huge files compared to reordering small amounts of data as...

      Do you have any place where someone can read further into this? I can definitely see myself not changing my approach when trying to parse huge files compared to reordering small amounts of data as they come in.

      1 vote
      1. Emerald_Knight
        Link Parent
        I'm not aware of any specific resources for this. My only recommendation is to study up on different data types and data structures in order to understand how these things are built at a lower...

        I'm not aware of any specific resources for this. My only recommendation is to study up on different data types and data structures in order to understand how these things are built at a lower level. Additionally, you could research optimization in general to start seeing that kind of knowledge put into practice. And it's important to keep in mind the golden rule of avoiding premature optimization, favoring a simpler working solution that you can rewrite into an optimized version afterward.

        But really, what it all boils down to is research and experience. Find a large data set to try to work with and compare its performance to a substantially smaller data set. Try to figure out what kinds of bottlenecks you start running into, and look into ways to work around those bottlenecks. Learn by doing.

        Beyond that, I don't really have any advice. Perhaps someone else could provide additional resources, however.

        1 vote
  2. crius
    Link
    You should really add the tag "optimization porn" because, god, that was a pleasure to read from top to bottom :)

    You should really add the tag "optimization porn" because, god, that was a pleasure to read from top to bottom :)

    8 votes