9 votes

How much space would it take to store every word ever said?

4 comments

  1. onyxleopard
    Link
    There are lots of bad assumptions here, but this is a neat exercise. Some of the bad assumptions include: People don’t speak ‘characters’. Writing is a recent development in human history and...

    There are lots of bad assumptions here, but this is a neat exercise. Some of the bad assumptions include:

    1. People don’t speak ‘characters’. Writing is a recent development in human history and digital encoding of written language even newer.

    2. If we are going to consider transcribing all speech, we wouldn’t use ASCII characters. That doesn’t cover most of the writing systems of natural languages that have existed. There’s also the problem of needing to invent writing systems for all the languages that don’t have one and aren’t covered by Unicode. What you’d want to do is measure the average number of bytes needed to encode an average word across all languages in a reasonable encoding like UTF-8. This is going to approximate something like the entropy of natural language, averaged across all writing systems, which is something that is studied by computational linguists and information theorists, which is super neat!

    1 vote
  2. [3]
    pseudolobster
    Link
    I took issue with your first assumption - the dimensions of a SD card. Your reference for this measurement is supposedly on page 82 of an issue of Science. I can't verify that without paying for...

    I took issue with your first assumption - the dimensions of a SD card. Your reference for this measurement is supposedly on page 82 of an issue of Science. I can't verify that without paying for access, but the ToC of that issue says the magazine is only 81 pages long. Hmmm.

    Anyway, 1mm seemed really thick for a microSD card to me, so I grabbed my calipers and found that only a small nub on the edge was 1mm thick. The rest of the card is 0.75mm thick. So, your figure of 163mm^3 is high, and the volume of a microSD card is actually 123.75mm^3. Now, that lip means they don't stack evenly, and must be tiled so that the nubs don't overlap. A stack of two cards is 16.5mm x 1.5mm x 11mm, giving a volume of 272.25mm^3, or 136.125mm^3 per card. There may be ways of stacking these things by shingling them in such a way the nubs overlap, but I'm not going to get into that. Anyway, that brings the figure down from 84m^3 down to 70.5m^3 for uncompressed text - a 16% reduction!

    1. [2]
      gpl
      Link Parent
      Not OP, but the reference for SD card dimensions is a linked Amazon page. I think the 'footnote' you are likely referencing is the "cubed" superscript on mm, indicating a volume. For what it's...

      Not OP, but the reference for SD card dimensions is a linked Amazon page. I think the 'footnote' you are likely referencing is the "cubed" superscript on mm, indicating a volume. For what it's worth as well, the page you linked to indeed shows that issues goes beyond page 81 - the only thing that ends there are the "special issue" articles. Keep scrolling on your linked page.

      3 votes
      1. pseudolobster
        Link Parent
        D'oh! You're right on both accounts.

        D'oh! You're right on both accounts.

        2 votes