15 votes

Where would a beginner start with data compression? What are some good books for it?

Mostly the title. I have experience with Python, and I was thinking of learning more about data compression. How should I proceed? And what are some good books I could read, both about specifics and abstracts of data compression, data management, data in general.

10 comments

  1. teaearlgraycold
    Link
    First thing to learn would be Huffman coding. Don't know where to go after that though.

    First thing to learn would be Huffman coding. Don't know where to go after that though.

    8 votes
  2. [2]
    tesseractcat
    Link
    The absolute most basic place to start for data compression would be run length encoding, which is essentially the most basic type of compression imaginable. After that, look into how some of the...

    The absolute most basic place to start for data compression would be run length encoding, which is essentially the most basic type of compression imaginable. After that, look into how some of the more popular compression formats like .zip work.

    8 votes
    1. Wolf
      Link Parent
      Sure thanks! Do you know of any long term roadmaps on this?

      Sure thanks! Do you know of any long term roadmaps on this?

      2 votes
  3. wirelyre
    Link
    My local library has Sayood's Introduction to Data Compression and Salomon/Motta's Handbook of Data Compression, both of which I would recommend. Sayood is a great medium-paced read that's usable...

    My local library has Sayood's Introduction to Data Compression and Salomon/Motta's Handbook of Data Compression, both of which I would recommend. Sayood is a great medium-paced read that's usable as a self-taught course, although it's a little heavy on the mathematics. Salomon/Motta is an awfully dry reference that is unusable for learning the basics, but it's an incredible overview of compression strategies and has a section for basically every algorithm in common use.

    6 votes
  4. 666
    Link
    Second step after RLE (that @tesseractcat mentioned) is LZW. The algorithm is simple, any tutorial should help, here's one and this is the one I used when I was interested in data compression....

    Second step after RLE (that @tesseractcat mentioned) is LZW. The algorithm is simple, any tutorial should help, here's one and this is the one I used when I was interested in data compression.

    Then you can branch pretty much anywhere, it's not simple after that (at least it wasn't for me). Somebody mentioned Huffman coding there, you can also try reading some arithmetic coding implementations. Matt Mahoney has a good book for free online, it's called Data Compression Explained. His whole website about data compression is a gold mine, make sure you read it and learn from common pitfalls and myths about compression too.

    5 votes
  5. [4]
    meghan
    Link
    Make a .zip parser! Then move on to .tar, .gz, .xz, etc.

    Make a .zip parser! Then move on to .tar, .gz, .xz, etc.

    2 votes
    1. [3]
      Wolf
      Link Parent
      Do you know of any books that might explain data compression in a more abstract way? Explain the concepts that goes into data compression?

      Do you know of any books that might explain data compression in a more abstract way? Explain the concepts that goes into data compression?

      3 votes
      1. vegetablesupercargo
        Link Parent
        If you want the theory behind compression, you're looking for information theory. I'm a fan of Pierce's book on Information Theory but there are a lot of excellent textbooks to choose from. A...

        If you want the theory behind compression, you're looking for information theory. I'm a fan of Pierce's book on Information Theory but there are a lot of excellent textbooks to choose from.

        A basis in information theory will help you understanding how it works and what the limitations of it are. Depending on the book you choose, you will probably come to see a lot of different approaches to compression.

        1 vote
  6. rkcr
    Link
    Colt McAnlis did a series called Compressor Head that has great explanations of a bunch of basic compression algorithms. He also wrote a book but honestly start with the videos, they're great.

    Colt McAnlis did a series called Compressor Head that has great explanations of a bunch of basic compression algorithms. He also wrote a book but honestly start with the videos, they're great.

    1 vote