46 votes

Make the Wayback Machine the real internet

12 comments

  1. [6]
    Atvelonis
    (edited )
    Link
    I have long operated on the principle that "if something on the internet is not archivable, it's not real." That is: any general-purpose article I write, any academic or scientific paper I...

    I have long operated on the principle that "if something on the internet is not archivable, it's not real."

    That is: any general-purpose article I write, any academic or scientific paper I publish, any technical answer I provide, any event I publicize, or really any idea I share is ephemeral and effectively nonexistent if it is not possible for web crawlers to document and preserve it.

    This applies to basically all fixed content. Obviously, people can read and benefit from ephemeral content like a group chat or private message board. However, as this article describes, most websites do a poor job of redirecting old URLs to new ones and a worse job of maintaining content that isn't immediately relevant to the business at CURRENT_TIME. Alarmingly, even government websites are affected. When reading an article here the other day I noticed that the entire US FRA website beyond the homepage is currently down and its legacy contents inaccessible without using the Wayback Machine. (While the homepage redirects to a new URL, nested pages don't.) It's important to be able to access primary sources!

    Even material that appears mundane, uninteresting, or culturally insignificant can become important to scholars and laypeople years in the future. For example, check out this 1994 interview with Bethesda Softworks on The Elder Scrolls: Arena (the oldest I know of for that franchise): this was accessible nowhere on the web until I dug up a link to it on Usenet. What was at the time a relatively unremarkable conversation is now a piece of cultural history important to understanding the development of both internet use and culture, corporate engagement with consumers, and the development of The Elder Scrolls as a mainstay among video game players. Without the Wayback Machine, that link would have been an eternal dead end.

    While I'm not as active as I once was, I've spent a considerable number of years editing Wikipedia and other wikis, sometimes going to excruciating lengths to find the original sources of unsubstantiated claims. Most of the time, my findings took me to the Wayback Machine. It's no wonder that Wikipedia has several bots that automatically replace 404 links with their equivalent on the Wayback Machine.

    I donate to the Internet Archive, and I hope their work continues unimpeded. I even have a browser extension that automatically archives all pages I visit that have not been archived in the last year. (As a courtesy, I except websites like Tildes and Reddit.) But it worries me dearly that we rely so much on a single entity as effectively the preserver of all internet-based knowledge. I know there are other archival tools out there, but few are as robust or have a library quite as extensive as the Internet Archive.

    40 votes
    1. [3]
      granfdad
      Link Parent
      As someone very big into the Elder Scrolls, I would very much like to thank you for archiving that interview! A very interesting read, given the series' history post Arena.

      As someone very big into the Elder Scrolls, I would very much like to thank you for archiving that interview! A very interesting read, given the series' history post Arena.

      2 votes
      1. [2]
        Atvelonis
        (edited )
        Link Parent
        You're welcome! I have a half-complete list of interviews on the wiki. This list is unique because it's supposed to catch development and recollective interviews of any sort, not just the ones...

        You're welcome! I have a half-complete list of interviews on the wiki. This list is unique because it's supposed to catch development and recollective interviews of any sort, not just the ones that the Imperial Library considers lore-relevant. I think I got most of the pre-Oblivion interviews, but not all. References to older ones are scattered across old fan sites as Bethesda has made no archival efforts themselves, even for official developer diaries. I'm also missing many old print interviews, which I haven't decided how to reference. (Have to locate them first!) From there it gets a bit trickier as some of the specialized news boards I was relying on for links were presumably replaced by new aggregate communities like Reddit. Very recent content is theoretically discoverable by search engine, but it tends to be drowned out by noise. It's a fun project though and I hope to have a relatively comprehensive list in the foreseeable future.

        1 vote
        1. granfdad
          (edited )
          Link Parent
          Might I suggest that you also add those to the UESP?, it is signifigantly better than the wikia both from a site perspective (fandom is an absolutely garbage site) and a content perspective (I've...

          Might I suggest that you also add those to the UESP?, it is signifigantly better than the wikia both from a site perspective (fandom is an absolutely garbage site) and a content perspective (I've seen claims of people adding completely unsourced headcannon into wikia pages, though I haven't seen it myself).

          EDIT: I should add that the UESP isn't exclusively lore related like TIL is, and thus there would be room for them, there is already a interviews page. If you find you dont have the time to add to the pages but do want them there, I can update it myself if you give me the OK.

  2. [5]
    cfabbro
    (edited )
    Link
    I get the message the author is trying to make about the problem of linkrot and "gaps" in the web, and support the idea (in theory) of having website archives+history available by default. And I...

    I get the message the author is trying to make about the problem of linkrot and "gaps" in the web, and support the idea (in theory) of having website archives+history available by default. And I also love the Internet Archive and Wayback Machine. I have even set up a recurring monthly donation to them, and have for many years now. But holy hell is IA's site ever slow, often timing out entirely whenever I try to use it. So actually relying on IA to view all my web content would be an absolute nightmare. :P

    20 votes
    1. bloup
      Link Parent
      This is something I’ve also experienced. To be honest it wouldn’t be so bad waiting a while if it gave you some feedback while it was retrieving the page and actually was able to do so reliably.

      This is something I’ve also experienced. To be honest it wouldn’t be so bad waiting a while if it gave you some feedback while it was retrieving the page and actually was able to do so reliably.

      4 votes
    2. Protected
      Link Parent
      This has probably been suggested fifty times before and I'm sure there are reasons why it hasn't happened yet, but I'd like to see them distribute the archive into a sort of highly redundant...

      This has probably been suggested fifty times before and I'm sure there are reasons why it hasn't happened yet, but I'd like to see them distribute the archive into a sort of highly redundant fediverse style system.

      1 vote
    3. [2]
      Comment removed by site admin
      Link Parent
      1. Omnicrola
        Link Parent
        Depending on the site and how it's built, some pages will self optimize and serve you different content if they see the requesting machine is a mobile device. A lot of modern websites suffer from...

        Depending on the site and how it's built, some pages will self optimize and serve you different content if they see the requesting machine is a mobile device.

        A lot of modern websites suffer from bloat that isn't really necessary to display the actual relevant content. As an good counter example, see how fast Tildes loads 😁

        11 votes
  3. Immortal
    Link
    The people behind Archive.org do great work when it comes to preserving the internet, but I feel like we shouldn't (perhaps fully) rely on them alone. The same work they do, we should be doing...

    The people behind Archive.org do great work when it comes to preserving the internet, but I feel like we shouldn't (perhaps fully) rely on them alone. The same work they do, we should be doing ourselves, and it's possible in ways with the power of Wget. ArchiveBox is really easy to set up, it has a nice web UI, and it has several different outputs i.e. PDF, HTML, PNG and even sends your links to Wayback Machine, but the cool thing about it is that you decentralize and archive websites locally. That way if anything ever happened to Archive.org, you have your own offline internet of whatever you chose, which I feel like is an even cooler thing about doing this. The developer behind ArchiveBox talks about all of this in this video. I would like to add that, I don't think everything should be preserved and archived. There's a lot of crap on the internet. This is why choosing is nice, and in a way, it feels important for whoever is interested in archiving to have their own setup. What you choose to archive and what I choose to archive will probably be quite different, so everyone's ArchiveBox would be special and unique.

    6 votes