24 votes

If humankind left Earth and came back after 100 years, how much of our digital files would still be readable?

That's something that concerns me a lot. A lot of what we know about our history came from analog media that was preserved throughout history. Will future generations (100, 200, 1000 years from now...) be able to access our digital documents to understand how we lived?

Edit: the scenario I proposed in the title was just a way to express my concerns more concisely. I don't think it will actually happen but answering it is equivalent to addressing my concerns..

23 comments

  1. [3]
    spctrvl
    Link
    There's archival digital storage media designed to last for that length of time (or longer), like the M Disc, but you're looking at heavy losses for most of the rest. Likely all flash storage is...
    • Exemplary

    There's archival digital storage media designed to last for that length of time (or longer), like the M Disc, but you're looking at heavy losses for most of the rest. Likely all flash storage is gone, its archival use isn't particularly well understood yet, but nobody expects it to last more than a decade unplugged. Removable magnetic media is probably mostly gone, tape degrades and is vulnerable to mold, though you may have some survivors. I would guess hard drives would fare better, due to more resilient and isolated platters. It wouldn't be plug and play, but so long as there's no physical damage, someone familiar with the technology and formats used could probably recover a lot if it's not encrypted. There's probably also a decent survival rate among regular optical discs, as long as they're sheltered, factory pressed instead of home burned, and not exposed to damaging conditions. Again, not likely plug and play, but recoverable. Lastly, certain types of ROM chips should last as good as forever, so if they can find and recap an old SNES, the games should all still work unless otherwise physically damaged, although they probably wouldn't save.

    While raw technical durability is interesting, what matters more for longevity is format decipherability. Finding a cache of old working DVDs isn't going to do you much good if you don't know how to build a DVD player, and reverse engineering it from the discs would be hard enough if it weren't for all the encryption. Thanks to DRM, even surviving optical media (or movies at least, CDs should be fine) will be so many coasters to any future society that's lost the encryption keys. Eradicating DRM is necessary to prevent a digital dark age even in the absence of social collapse, maintaining format portability is difficult even in the best of times. Fifty years from now we'll all be thanking god for piracy, and what we would've lost without it.

    27 votes
    1. [2]
      SunSpotter
      Link Parent
      I think HDDs are more prone to long term failure than people may realize. I've dealt with lots of old hard drives, and with mixed results. Some of this may be fixable (or already fixed) with...

      I think HDDs are more prone to long term failure than people may realize. I've dealt with lots of old hard drives, and with mixed results. Some of this may be fixable (or already fixed) with modern technology, but here's just a few things that can go wrong with hard drives only 30-40 years old.

      1. Bit-flip: People mostly talk about bit flip in the context of copying errors, or in environments with high radiation where passive bit flip is more likely, but it happens in all magnetic media given enough time.

      2. Head sticking: If a hard drive doesn't move for a long time, as in the span of decades, there is a possibility that one of the heads will become physically stuck to the platter. This can sometimes be fixed, but will always result in loss of data wherever the head became stuck to the platter. Most modern HDDs shouldn't be affected by this because they're supposed to 'park' the head away from the platter, but you never know. It's possible that some cheaper drives, or older drives still in use today don't have such functionality.

      3. Material degradation: This is a big one that flies under the radar I think. Seals, O-rings, anything that could be used in the construction of a hard drive and which is made of rubber will degrade over time. If you've ever picked up an old piece of plastic or rubber that's become sticky or gummy, this is what I'm talking about. Not only are you no longer guaranteed to get a perfect seal when this happens, it can also seize up the motor, or other moving parts.

      4. Low-level format degradation: When you format a drive, you're performing whats known as a high level format, but there exists an even more fundamental type of formatting that tells the disk controller how disk sectors are laid out, and it has to be calibrated very precisely. Modern drives have this low-level format performed at the factory, and consensus seems to be that attempting LLF on a modern drive would destroy it. Which means that if the controller either forgot, or could no longer read where those sector boundaries exist on the disk, the data would be irrecoverable.

      Worth a mention that it's uncertain how suitable CDs are for long term storage since as a plastic, they can degrade as well. It's difficult to say how much of a problem that will be though. I only know that it hasn't become a problem with vintage CD media yet. Still, it really only leaves ROMs as the only surefire traditional media to survive as far as I'm aware.

      1 vote
      1. Toric
        Link Parent
        If we are talking archeology, however, 2-4 can by overcome by advanced data forensics technique (2 not entirely, but its only a small area that gets lost). Definitly not plug-and-play, but keep in...

        If we are talking archeology, however, 2-4 can by overcome by advanced data forensics technique (2 not entirely, but its only a small area that gets lost). Definitly not plug-and-play, but keep in mind we've spent decades deciphering a single clay tablet, an HDD platter will be comparatively a peice of cake.

        1 vote
  2. Greg
    Link
    One of my favourite examples of extreme long-term digital archival is the GitHub arctic vault, which includes data encoded as opaque/transparent bits on physical film reels - there's an enjoyably...

    One of my favourite examples of extreme long-term digital archival is the GitHub arctic vault, which includes data encoded as opaque/transparent bits on physical film reels - there's an enjoyably dramatic video about it at the bottom of that page.

    That's the outlier, though: the vast majority of digital data is only stored electronically. That majority really is vast - we generate more data within a year or so nowadays than in the entire sum of prior human history, and while I can imagine some interesting behavioural insights coming out of data archaeology in the future, I'm willing to say we can ignore the system logs and user analytics and all the rest for now, and focus on actual consciously-created works.

    So, we've got a bunch of text, video, images, audio - all stored on a few hard drives/SSDs and maybe some magnetic tape if you're lucky. Chances of leaving that totally unattended, and then coming back to access them with our current tech level are pretty slim. Bearings seize, circuits oxidise, boards and chips degrade, format documentation is lost, and things generally become inaccessible even if the data is still technically there.

    If we were 100 years more advanced, that poses an interesting question: there's a greater possibility of efficiently reading patterns from hard drive platters or flash chips in a way that's achievable at scale without requiring the original controller. No guarantee, of course, but a better possibility - just as we can x-ray or MRI old documents now that would have been thought lost by their creators.

    The more interesting, more hopeful, and in my opinion more likely option is what happens if humanity doesn't take a 100 year break. All the systems keep running and expanding, the technology keeps advancing, the internet remains online. In that world, it's more a question of what's saved and in what format: the way storage prices keep going, it tends to be plausible to keep almost everything. This is good, because lots of copies keep stuff safe - while some things will inevitably be lost, that's been true throughout history.

    I think our chances of keeping a meaningful fragment of our cultural heritage alive are much better now than at any time in the past. We still have books, and vinyl records, and photographic prints - and then we also have redundant, distributed, searchable copies of effectively every major work and the vast majority of minor ones. We have redundant, distributed, searchable copies of a huge amount of day-to-day trivia as well.

    Search will probably become a limiting factor as much as storage, but that's a good problem to have: it means things are misplaced rather than lost, and they can always be rediscovered with time. If civilisation as we know it survives, I think a significant portion of our digital footprint will survive along with it.

    13 votes
  3. petrichor
    Link
    Not many. Data corruption happens extremely easily. If the servers were left turned on, modern file systems and hardware are pretty good at error correction. 3Blue1Brown made a very nice pair of...

    Not many. Data corruption happens extremely easily.

    If the servers were left turned on, modern file systems and hardware are pretty good at error correction. 3Blue1Brown made a very nice pair of videos on how simple error correction works, if you're interested.

    But similar to @PhantomBand, I'm also not very concerned. Backups and redundancy have become a staple of the modern Internet, whether concentrated in one place like the Internet Archive, or distributed like Wikipedia on the InterPlanetary File System and Library Genesis. I also think it unlikely that societal collapse / a natural disaster would affect all of humanity fast enough that archives wouldn't be able to be copied and stored somewhere they can be maintained.

    You might also find /r/DataHoarder reassuring.

    8 votes
  4. [4]
    DataWraith
    (edited )
    Link
    Why would humankind leave and then come back? I think the much more likely scenario is that whatever is considered valuable will get saved and transferred into a new format over time. You can...

    Why would humankind leave and then come back?

    I think the much more likely scenario is that whatever is considered valuable will get saved and transferred into a new format over time. You can convert old data formats into newer ones or write emulators that can deal with the old file format. I like to imagine a future (kind of what Vernor Vinge described in one of his short stories), where "programmer archaeologists" dig through old data and kludge together a stack of emulators within emulators to make some old software work again in order to be applicable to the contemporary problems...

    The real problem IMO is that the most valuable things to store from an archaeological point of view is often the trash, the useless things, that are so commonplace as to be taken for granted.
    For example, Geocities is a treasure trove of information about the ordinary life at the turn of the century, and Yahoo didn't think twice about deleting all of it because it seemed so ordinary -- incidentally it was saved from ruthless deletion by a concerted archival effort, but we're probably not going to be so lucky with everything.

    We can't store everything forever though, so there has to be some sort of curation applied.

    5 votes
    1. [3]
      Toric
      Link Parent
      Could I get the name of that short story, or a link if you have one handy? I love sci-fi short stories that explore unique concepts.)

      Could I get the name of that short story, or a link if you have one handy? I love sci-fi short stories that explore unique concepts.)

      1. DataWraith
        Link Parent
        I thought I remembered it being in one of The Collected Stories of Vernor Vinge, but that wasn't it. I think I found it again though: it is from his book A Deepness in the Sky. My memory is fuzzy...

        I thought I remembered it being in one of The Collected Stories of Vernor Vinge, but that wasn't it.

        I think I found it again though: it is from his book A Deepness in the Sky. My memory is fuzzy now, but the main thing I remember about it was the fact that the programmer archaeologists kept time in seconds (megaseconds, gigaseconds) and thought that the UNIX epoch marked the time that mankind first set foot on the moon...

        2 votes
      2. Tardigrade
        Link Parent
        There's a really nice radio play in 9 parts that touches on similar concepts. It's called Forest 404 on BBC sounds. Best to into it without reading the future episode sinopses as that spoils it a...

        There's a really nice radio play in 9 parts that touches on similar concepts. It's called Forest 404 on BBC sounds. Best to into it without reading the future episode sinopses as that spoils it a little. https://www.bbc.co.uk/sounds/brand/p06tqsg3

  5. [7]
    PhantomBand
    Link
    Sorry for the irrelevant question, but why? It sounds more like a hypothetical to me. But anyways, I do think everything would be lost, simply because stuff breaks down at some point and won't get...

    That's something that concerns me a lot.

    Sorry for the irrelevant question, but why? It sounds more like a hypothetical to me.

    But anyways, I do think everything would be lost, simply because stuff breaks down at some point and won't get replaced. And then there's also natural disasters.

    4 votes
    1. [6]
      mrbig
      (edited )
      Link Parent
      Well I am concerned with the preservation of our culture. That concern points to the future, and concerns about the future are valid concerns, even in hypotheticals. Edit: the scenario I proposed...

      Well I am concerned with the preservation of our culture. That concern points to the future, and concerns about the future are valid concerns, even in hypotheticals.

      Edit: the scenario I proposed in the title was just a way to express my concerns more concisely. I don't think it will actually happen but answering it is equivalent to addressing my concerns...

      5 votes
      1. [5]
        BlindCarpenter
        Link Parent
        As it gets easier to store media I think more and more will be preserved as time goes on. Already I think we store way too much useless information that erodes our privacy.

        As it gets easier to store media I think more and more will be preserved as time goes on. Already I think we store way too much useless information that erodes our privacy.

        4 votes
        1. [4]
          mrbig
          Link Parent
          I guess my question is: is digital storage inherently less permanent than words on paper, paintings, or sculptures? Can a digital media last for 1000 years?

          I guess my question is: is digital storage inherently less permanent than words on paper, paintings, or sculptures? Can a digital media last for 1000 years?

          2 votes
          1. Amarok
            Link Parent
            Yes, billions in fact. We're not quite to the age of photonic computing yet, but the puzzle pieces are all there right now in the lab, even optical logic gates and processors. The gold standard...

            Yes, billions in fact. We're not quite to the age of photonic computing yet, but the puzzle pieces are all there right now in the lab, even optical logic gates and processors. The gold standard for long term storage is superman memory crystal. That's encoding your data (with extreme redundancy) into rather tough rocks.

            2 votes
          2. BlindCarpenter
            Link Parent
            good question. I think if stored correctly, digital media won't degrade as quickly as paper media. As long as someone makes copies of it (which is quite easy to do) it will last forever

            good question. I think if stored correctly, digital media won't degrade as quickly as paper media. As long as someone makes copies of it (which is quite easy to do) it will last forever

            1 vote
          3. Octofox
            Link Parent
            Most of those paintings and sculptures only lasted this long because someone has cared for them the whole time. If you care for data its pretty easy to store forever. Drives may die but you can...

            Most of those paintings and sculptures only lasted this long because someone has cared for them the whole time. If you care for data its pretty easy to store forever. Drives may die but you can perfectly replicate data and detect errors so if someone is maintaining it, it will last forever.

            1 vote
  6. [6]
    joplin
    Link
    The question as posed seems odd to me. But the broader question of how well will things be preserved is a reasonable question. Judging by the past, the things that stick around will be the things...

    The question as posed seems odd to me. But the broader question of how well will things be preserved is a reasonable question. Judging by the past, the things that stick around will be the things we choose to preserve because we find them important enough to keep around.

    We are already taking measures to preserve media from the 20th and 21st century. The Internet Archive not only crawls the web and copies it for preservation but has pages for cataloging TV shows, the news, books, etc. It has troves of stuff from the mid to late 20th century that you wouldn't have thought would be worth preserving (such as terrible TV shows with the commercials still in them).

    So I'm not overly concerned that we'll lose too much due to popular formats like JPG and H.264 becoming unreadable in the future. (It might happen, but they're so well documented in so many places, it seems likely that the information to decode them will survive for a while.)

    One concern I have is whether minority thoughts and culture will be preserved in any way. I think that's more likely to happen now than at any other time in the past. Many cultures we've lost only passed things down through oral traditions, and once those stories stopped being told, they were lost forever. Since we record (whether through video, photography, audio, or writing) everything we do, I think that's less likely to happen. (I'm not saying things won't be lost - they certainly will. Just that we're able to document far more these days and with far more accuracy than ever before and get them to a very wide audience quickly and cheaply so the chances of preservation are higher.)

    Many things that don't need to be preserved will. It can be useful for understanding the past what things were popular in a culture and why. But the most popular things are the most likely to be preserved just because there are so many more of those things. There were 32 million copies of Michael Jackson's Thriller sold in 1983 alone. It is much more likely to be preserved in multiple forms than, say, an album put out by a group that never hit the charts or a poem written by ... well anyone since poetry is not something most people read these days. I'm not complaining that Thriller will be well preserved as it's certainly earned its place in our culture. But there are likely to be other things that were either more influential but less well documented or better but less popular that won't be preserved. C'est la vie!

    I really hope most of what's on the Internet won't be preserved. Most of what's on the internet is either junk posts that don't need to be passed down through the ages, or corporate junk extolling the virtues of using some system that they'll stop producing in 2-4 years. It's pointless and the energy it takes to preserve such stuff would be better spent elsewhere.

    4 votes
    1. [5]
      mrbig
      Link Parent
      I believe even the most egregious content might be of interest to future historians, just like current historians get all kinds of valuable insight from the most trivial and vicious records of the...

      I really hope most of what's on the Internet won't be preserved. Most of what's on the internet is either junk posts that don't need to be passed down through the ages, or corporate junk...

      I believe even the most egregious content might be of interest to future historians, just like current historians get all kinds of valuable insight from the most trivial and vicious records of the past...

      1 vote
      1. [4]
        joplin
        Link Parent
        That's a fair point. I just know that had social media existed when I was a kid, I would have posted a bunch of crap that wouldn't have been important (or even decipherable) a month or two after I...

        That's a fair point. I just know that had social media existed when I was a kid, I would have posted a bunch of crap that wouldn't have been important (or even decipherable) a month or two after I posted it, let alone 100 to 1,000 years from now.

        Scot Adams, the creator of Dilbert, had a good example of this in one of his books. If I recall correctly, he had drawn a cartoon for his strip a few years earlier and looking at it while putting together his book, he couldn't make heads or tails of it. It related to some event from the week it was printed in the newspapers, but he couldn't figure out what the event was, why he cared at the time, or even what the joke was. I feel like that's the vast majority of posts on social media. They won't make any sense to historians in the future because they likely won't make any sense to the people who wrote them in a few years.

        I've been to Pompeii and seen that they've gotten a great deal of information about the lives of the people who lived there by picking through their trash, for example. So believe me when I say I get it. But I feel like we're producing more trash faster and with even less value these days. I could be wrong, though. I just don't see why knowing that Jenny Q. Random enjoyed the Taylor Swift concert last night will be important to future historians. They likely already know the cultural significance of Taylor Swift. They likely won't know anything else about Jenny Q. Random.

        1 vote
        1. [2]
          gpl
          Link Parent
          That's an interesting idea. Something is better than nothing (e.g. Pompeii), but is everything better than something?

          That's an interesting idea. Something is better than nothing (e.g. Pompeii), but is everything better than something?

          3 votes
          1. mrbig
            Link Parent
            The thing is, it is not up to us to decide what will be of interest for future humans. Something we believe is valuable may be of little interest in retrospect, and things we classify as trash may...

            The thing is, it is not up to us to decide what will be of interest for future humans. Something we believe is valuable may be of little interest in retrospect, and things we classify as trash may become essential for future societies. So we better preserve as much as we can. That is my view at least.

            2 votes
        2. mrbig
          Link Parent
          I think it is safe to say that future historians will have access to enough specialized software and processing power to easily sort through our digital garbage in order to check their hypothesis.

          I think it is safe to say that future historians will have access to enough specialized software and processing power to easily sort through our digital garbage in order to check their hypothesis.

          2 votes
  7. Deimos
    Link
    The overall talk moves on to a different topic eventually, but I'd highly recommend watching at least the first part of this video: Clay Shirky - Making Digital Durable: What Time Does to...

    The overall talk moves on to a different topic eventually, but I'd highly recommend watching at least the first part of this video: Clay Shirky - Making Digital Durable: What Time Does to Categories

    The video is 15 years old, so some of it's outdated by now, but the first part talks about how difficult it could be to access digital files in the future and how difficult of a problem preservation is. Clay Shirky is one of my favorite people to listen to about technology.

    3 votes