30
votes
Permanent archival formats. Do they exist?
Recently, I've been thinking pretty hard about how to archive data. Optical media is out, due to my (possibly irrational?) fear of disc rot. HDDs just break with extended use, SSDs have been known to die with either overuse or just existing for an extended period of time. What's left?
I have heard of tape (of some kind) being used for backup in some bigger operations, but with my experieces with VHS, and to a lesser extent, cassettes, they seem to be very susceptible to mould.
Any suggestions?
All information eventually gives way to entropy.
The only real solution is constant refresh, validation, and multiple copies of data. The only reason we know about many historical things is due to multiple copies of organisms existing and we are able to find the lucky ones that were preserved in tree sap.
Likewise, we are lucky to find old stories because they were passed down through constant rehearsal in oral tradition and/or because some crazy guy brought some scrolls up to a mountain. There are probably many scrolls that we don't know about because they weathered away. We just find the lucky ones. It's chance.
Analog may last longer than digital data but I wouldn't trust just one type of storage. If you care about something you need to have multiple copies and validate that the data is still readable every year or at the very least 5-10 years. If (VHS) players are no longer manufactured you also need to consider that aspect as well.
Even stone tablets have bitrot at a timescale of months if the conditions are right
I was thinking more on the scale of 20 years rather than a million, but you have a good point ;)
Hard drives don't lose contents if you refresh them every x years (the value was north of 10 I think, but let me say 5 for safety margin until you've done your research) and don't break from a handful of overwrites. Just don't keep them spinning continuously/cumulatively for more than 5-ish years and checksum your data.
Keep it simple, buy one big drive as primary copy, one for local redundancy, one for off-site redundancy, and don't have all three in the same place after writing the data. Multiply if you have more than what reasonably fits on one drive, like around 10TB or so. Refresh and verify every 5 years. Done.
No need to go for encoding QR codes into clay tablets if 20 years is all you need! The title and post you originally submitted also had me thinking you meant millions of years... at this timescale you barely have to worry about anything. Probably a normal DVD would do fine but those are slow and small
I don't think a DVD would do it. I still have my old CD-Rs from not even 15 years and they're all coasters :/
There are CDs with an expected 100 year life span but they're not the same ultra cheap stacks you buy at Walmart or wherever. I haven't used archival discs in a hot minute (lol about 20 years) but I do remember having to use the slowest burn speed so you didn't make a VERY expensive coaster due to buffer under-run.
Archival Grade CD example
M-disc BD-R projected for "hundreds" of years -- ironically with a 10 year warranty.
Even back when I burned CDs regularly there were some brands/price points I just avoided because of the "Let's guess who'll be a coaster by next month!" game.
Yes, optical media can indeed be reliable. There is a subset of "professional" products that go the next step by including a caddy for the disc to be permanently stored in (see Sony's Professional Disc format for an example).
We didn't take computer CD-ROMs out of caddies because they weren't reliable, we took them out because it was cheaper.
Oh right, home made optical media is less reliable. Pretty sure all the CDs my parents have since before I was born still work though, and I haven't been 15 years old in a while
My rollercoaster CDs that an uncle copied when I was little, I had assumed didn't work anymore due to the web of scratches, but age probably didn't help. The iso file I keep on a hard drive, though...
Yeah, the pre-made media is all good. My oldest CDs are from about 1993 and still work great.
Exactly. Even etched stone tablets will be gone, and we have far too much information to store in that manner.
But yeah, constant refresh with error correcting codes, and that can't handle EMP.
AFAIK modern tape in a proper facility is as close as we have. Tape doesn't degrade like CDs, and the medium doesn't mechanically fail like hard drives or rely on a charge like solid state. You need to ensure it is cool, dry, and away from EMFs. There's a reason tape storage is still seeing innovation.
Enterprise tapes for things that matter are sent to proper data warehouses. But tape is purely for cold storage, whocj is to say it is brilliant for archiving.
I've never heard of someone using tape for personal archival work but I would be very curious to see a solution to that effect. I want to start thinking long term about the data I produce today.
Yeah, I messed up and thought it was just a discussion about the best format.
It's not impossible. I don't think the tapes are too expensive, but the drives are the last time I got bored and looked into it.
Do you know the name of the format they actually use? Assuming it isn't too expensive, I might look into it.
Most archival tape storage I've seen uses the Linear Tape-Open (LTO) technology.
Holy shit, this might work!? I looked on ebay, and a 400gb tape only costs £4.50 and a drive is <£50! I might actually go for this after a little more research. Thanks!
I think LTT did a video on this a while ago. While not always the best for information, they had shown that for long term, cold storage tape is both the best and cheapest solution.
Just make sure you use LTFS format, that's an open standard!
Depending on how much you need, it might be worth investing more in a newer drive. It’s significantly more user friendly and if your LTO3 or LTO 4 drive breaks the only hope is buying used from ebay or such.
LTO is generally backwards compatible 1 version behind for RW and 2 versions for RO. So an LTO8 drive can write to a 7 tape and read a 6.
It’s a big up front cost, but as soon as you get above 100-200TB it tends to even out. By 300-ish TB it’s only getting cheaper and cheaper (£3-5 per terabyte)
Just be aware that read/write speeds are extremely slow. I managed to get an LTO5 drive for cheap recently, and that’s one of the biggest things that I’ve run into as I’m trying to integrate it into my backup strategy.
How slow? I don't really mind if it's over 100KB/s, if it's under that, it might be a concern.
LTO-5 is more or less like a middle-grade USB 3.0 drive.
Nice! Even USB 2 is fast enough for me, so no worries.
Tape has a shorter lifespan than decent quality CDs, doesn't it?
Isn't tape vulnerable to bit rot? I thought tape archives have processes to error check against redundant copies on some cadence (though it's a very long cadence, like decades).
Long time archiving are more about system robustness than single media robustness.
If you are looking for a single media that will outlive you with 100-1000 years, I hope you are a millionaire. Some alternatives exist.
Most solutions today though are constructed to work around the faults of single media and instead secure long term robustness from protocols and redundancy.
One disk fails in your NAS? Just replace it and rebuild it.
The whole data center burned up in a fire? Sucks. You’ll have to rebuild it and then populate the data from one of the replicas.
For the common man, if you want to ensure 100% readability in your lifetime, whatever media you choose your best bet is to define a protocol where you replace your media every 5 or N years with new devices, and always keep a backup.
For most of us though, a 3-2-1 backup system is more than enough.
There are some good answers in this thread.
As someone who works with archival collections, I find that the most permanent form of storage is one that is updated as technology evolves. Take a photograph from 1920 as an example. Over a century old. You can preserve that photograph in an acid-free folder, inside of a tightly sealed container with no UV exposure, in a dry and climate-controlled environment, and it will be preserved near indefinitely. But obviously most objects and documents in the real world aren't preserved to that level of conservation.
But if that photo had been scanned into a videodisc in the 1970s, then transferred to a compact disc in the 1980s, then placed on a floppy drive in the 1990s, then downloaded onto a hard drive in the 2000s, then copied onto a usb drive in the 2010s, and now uploaded to cloud server storage in the 2020s, then that photo would still exist in some form today. Even if the original paper photograph had been lost or destroyed decades ago, the content itself (an image in this case) would be preserved seemingly forever because effort had been made to update the content into whatever the modern technological medium was at the time.
One day in the future, when cloud storage starts becoming obsolete and people have information stored directly in their brains or whatever, you transfer that photo file from your server into your brain and it continues to live on even though multiple iterations of that image have been lost to time. It's all about redundancy in my field. If something is lost, hope you have tons of backups, in whatever forms they exist.
The Library of Congress has a nice list of recommendations for how to store any media. They even tell you what digital file types to use.
I remember hearing a talk from a history professor a few years ago where they were suggesting ways we can make materials more useful for future historians. They said that if you want to be sure your media can be read in the future, nothing beats high-quality paper. That's the one format which has stood the test of time, which we can be sure will be readable indefinitely.
Of course, that's not exactly practical for a video or a database or a song. If you want to store your digital data for a long time, an LTO Ultrium cartridge will cost about $2 per terabyte and is designed to safely sit on a shelf for 30 years. Of course, the cartridge reader itself will set you back about $1000, and you can only read your data back a few hundred times before the tape wears out. For most people it's more practical to use a cloud backup service and a regular spinning platter hard drive for local backups.
Remember, it's not a question of whether your backups will fail. All methods can fail. What matters is that failures are few and far between, that you notice when it happens, and you can recover gracefully from a failure.
Github did a pretty large long-term-storage thing a few years ago: https://youtu.be/fzI9FNjXQ0o. They did it by printing QR-encoded data onto silver-halide film, similar to tape. Not sure how viable this is for a consumer though.
I think the simplest solution for long term storage is just replicated and hooked up to a Raspberry Pi or something for monitoring.
Awesome solution, but it would need to be a proper apocalypse for me to be bothered scanning all those codes!
You're probably right on the second point though. I've been thinking of getting a pi for Pi-Hole anyway, so integrating it into my server wouldn't be a bad idea.
I've read about this optical disc format in the past. The company that makes them claims they're good for 1,000 years, which kinda smells like BS to me, but I guess NIST says they should be good for at least 100 years.
I've never tried using them, but I know my Blu-ray burner supports them. It's not a fancy one or anything — I think most drives work with them. The discs aren't cheap though...
Cheap enough. If they actually last as long as they claim, then it seems fair. Surprised I never heard of this before, thanks!
I really like the concept, and the Wikipedia page looks credible, but the official M-DISC website does not inspire confidence. It's chock full of generic stock photography and marketing copy, and it hasn't been updated in four years. Looks like their WordPress blog was hacked in 2016 and they either never noticed or didn't care to fix it. That doesn't seem like much of a professional operation to me. My guess is, even if the tech is sound, the company's kaput. You could archive all your data to these discs but without access to the proprietary hardware required to read them again, it's all moot.
No proprietary hardware is required, actually!
M-Discs are written in standard optical disc drives (DVD-R and Blu-Ray BD-R are both supported). And they are read in any optical disc drive, so long as the underlying format is supported (e.g. you can't read a BD-R in an old DVD-only drive).
(I'm not involved in M-Disc and don't own any. But they seem like a decent choice for medium-term archiving – decades, not centuries – and I would be comfortable using them to store old family photos and videos.
To move the goalpost a little: I'm not exactly confident generic PC-based DVD & Blu-Ray readers will last anywhere near 100 years. The article about floppy disks on the front page is pretty much what I expect to happen to today's optical standards ~50 years from now on. People find solutions in scavenged hardware, new addons for that scavenged hardware (like USB<->FDD translators with new cases) and drive emulators, but the supply of working scavenged hardware is starting to run out and emulation is no good if you want to read existing media. 3.5" seems to be barely clinging on to life, working 5.25" drives seem to be getting rather pricey and 8" is practically extinct.
PC Blu-Ray readers in particular have had very little market adoption, so I think that once production of drives stops, it'll become difficult to find new hardware rather quickly. DVD stands a better shot at surviving the test of time, but even then I expect that the hardware will become incredibly tough to find in about 30 years and essentially extinct in 50.
I have similar concerns for a lot of other long-term archive media. A "living archive" where a human transfers it to a new thing every 10 years or so seems like the only real option to me.
Thanks for the clarification! I misunderstood the requirements. If any drive is capable of reading them, I feel a lot more confident about the viability of the format. Still concerned about the company based on the state of their site, and I wonder about the availability of the media itself.
I just checked and the company that designed M-Disc is basically out of business, but since the media itself is produced by other companies under license, it should be OK. Still, I don't know how useful optical media will be to me in 2 more decades, especially given that the M-Disc BDXL format tops out at 100 GB.
Today I use HDDs, and periodically copy everything to newer disks. It's fast and relatively low-risk for my purposes. (I used to use DVDs, but my storage needs outgrew that badly.)
For personal files, I buy a new computer every decade or so and copy anything I want to keep from the old one to the new one. I like to start with a fresh install, so I have archive folders for a few previous computers and accounts. Then I back all that up to an external drive using Time Machine (since I buy Macs).
The last migration I did was a bit tricky because I moved from an iMac with an old-fashioned hard drive and USB 2 / Firewire to a Mac Mini with USB 3. Also, Apple changed their preferred filesystem. The new computer has a smaller hard drive so I went with a new external SSD for archives, and a bigger new external SSD for backups, and then I found that I'm low on ports.
With adapters it's not a big deal, though. If it's every 10 years or so, things will have changed but the old stuff is still readable without trying too hard. Technology goes obsolete, but not that quickly. I expect USB 3 has a long life ahead of it.
Also, for this to be useful at all, it not only needs to be possible to read your archives, but convenient to do so. I have old pre-Gmail email archives that I never look at and a dump of my Google+ account (R.I.P.) but I can't be bothered to get it into a convenient format for browsing. I don't listen to my old music collection much now that there is YouTube and Spotify, though it's easy to do. I'm more likely to look at photos than anything else, and not having Google Photos backed up all that well is a vulnerability that I should see to.
This isn't what libraries or historians have to deal with. You don't have to worry about historians. Some historical perspective:
In ancient Rome they used papyrus and it was all lost, except for:
I expect future historians to have plenty of random junk left from our era. Making copies is much easier and they have many petabytes to choose from. I don't think anyone needs to worry about them. The only people who will care about your personal backups are maybe your descendants.
Assuming they know the password. Are your backups encrypted? Should they be?
The one feature of Macs I actually love! Easiest backups in the world.
The only reason I even own a Mac is just because Apple is so intent on never putting Logic X on Windows.
I don't have a good answer for this, but a story. From about 2005-2008, I worked for a public school district's IT department. They had a requirement to keep school records for 60 years. But the only accepted method to archive those records was... microfilm. It was the only technology that they had at the time that was already proven (beyond a theory) to last that long, especially without having to deal with very specific storage conditions. I'm curious as to whether they've moved on from that, or if they're still using it.
I can add to that. My national building registry company uses microfilm or paper format (depending on recency - they send paper for recycling after some time) to store building plans.
They are required to store plans forever.
Very insightful thread! I didn't know tapes were still being developed upon..
AWS provides Glacier, which is a deep archival storage service, ridiculously cheap and almost zero chance of losing your data since AWS takes multiple backups of it on their data center.
It's perfect for a cold storage and minimal retrieval, and unlike tapes, Glacier can retrieve data instantly if needed too, although that would incur some cost.
Over time I think glacier would be an excellent choice rather than investing in hardware yourself, because you don't have to worry about losing your data, nor maintaining it, nor you must worry about the hardware, where to store it in your garage etc. You have no overhead if you use glacier, so I think it's a great permanent archiving format.
But then the answer comes down to trusting external companies with my data, which I certainly don't.
You absolutely don't have to, although AWS encrypts everything with AES-256 and only you as the account owner can access your data inside S3. You know they have to comply with a fuckton of legal and government regulations in order to maintain the trust they have with other huge billion-dollar companies who also wouldn't trust external companies with their data.
They also have many options to encrypt your data with your own keys (customer managed keys) and that way, AWS has absolutely no idea what your data is since they won't have the key to decrypt anything. Or you can encrypt whatever it is yourself and upload onto S3, that way you don't have to worry at all since the data is encrypted before it even reaches their service.
I know I might sound like I'm shilling for them but as someone who uses Glacier Deep Archive myself, I store all my old photos, video tapes, and memories onto there, that way I am preserving them for lifetime, and I don't have to worry about losing them even ten years down the line and If I have to download them again, I can do that from anywhere in the world. It's just one of the greatest benefits of the cloud. It's laughable how little it costs too.
Sounds alright. I might look into it :)
Personally I use Borg with an ISP to store my files.
I use the secret Borg plan on rsync.net, which offers a cheaper service but with reduced support.