The English-language edition of Wikipedia is blacklisting Archive.today after the controversial archive site was used to direct a distributed denial of service (DDoS) attack against a blog.
In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger over a post that described how the Archive.today maintainer hid their identity behind several aliases.
“There is consensus to immediately deprecate archive.today, and, as soon as practicable, add it to the spam blacklist (or create an edit filter that blocks adding new links), and remove all links to it,” stated an update today on Wikipedia’s Archive.today discussion. “There is a strong consensus that Wikipedia should not direct its readers towards a website that hijacks users’ computers to run a DDoS attack (see WP:ELNO#3). Additionally, evidence has been presented that archive.today’s operators have altered the content of archived pages, rendering it unreliable.”
More than 695,000 links to Archive.today are distributed across 400,000 or so Wikipedia pages. The archive site, which is facing an investigation in which the FBI is trying to uncover the identity of its founder, is commonly used to bypass news paywalls.
[...]
Guidance published as a result of the decision asked editors to help remove and replace links to the following domain names used by the archive site: archive.today, archive.is, archive.ph, archive.fo, archive.li, archive.md, and archive.vn. The guidance says editors can remove Archive.today links when the original source is still online and has identical content; replace the archive link so it points to a different archive site, like the Internet Archive, Ghostarchive, or Megalodon; or “change the original source to something that doesn’t need an archive (e.g., a source that was printed on paper), or for which a link to an archive is only a matter of convenience.”
[...]
Evidence presented in the Wikipedia discussion showed that Archive.today replaced Nora’s name with Patokallio’s name in the aforementioned blog post. The Archive.today capture has since been reverted to what appears to be the original version. In other cases, Archive.today captures included a “Comment as: Jani Patokallio” string on captures that previously had a “Comment as: Nora [last name redacted]” string.
Even if the snapshot alterations hadn’t helped convince Wikipedia’s volunteer editors to deprecate Archive.today, the Wikimedia Foundation itself might have stepped in. In its comments on the DDoS, the nonprofit that operates Wikipedia said on February 10 that it had not ruled out intervening due to “the seriousness of the security concern for people who click the links that appear across many wikis.”
Woof. I understand the decision but that’s potentially a LOT of valuable archived data that will end up being discarded. Hopefully most of it can be safely re-archived elsewhere but I doubt that...
Woof. I understand the decision but that’s potentially a LOT of valuable archived data that will end up being discarded. Hopefully most of it can be safely re-archived elsewhere but I doubt that will be possible for a significant portion.
Until now I hadn’t heard the accusations of archive.today tampering with saved page content. I want to give it the benefit of the doubt because it’s always seemed trustworthy to me. But without knowing the operators and their agendas who can say for sure (and this DDoS situation does not inspire much confidence). If a site is no longer online there’s no way to guarantee the authenticity of the version archive.today is serving. Honestly the same could be said for any of the alternatives though… we believe that the Wayback Machine is preserving content in good faith, but they could be selectively modifying specific URLs and no one would be the wiser.
I don’t think page archival is a solved problem yet. The strengths of archive.today revealed the shortcomings of Wayback. There are a number of improvements I’d like to see, personally. After reading this I think some kind of cryptographic signature scheme for pages might be helpful too. It could at least prove that archives haven’t been changed over time, but you’d still have to trust the accuracy of the original scrape. Though In the LLM era that might not be enough either. 🤔
Hopefully cleverer minds than mine are already thinking about this.
Well said. I am concerned that we are going to see a massive cascade of websites blocking them now. I have never had any access issues with archive working on any website, whereas the alternatives...
Well said.
I am concerned that we are going to see a massive cascade of websites blocking them now. I have never had any access issues with archive working on any website, whereas the alternatives can be spotty.
Here's the previous discussion from the first thread about this, for continuity's sake. Wayback Machine and ghostarchive seem to be the main two mentioned there.
I've been using ghostarchive since the first post about this but there was some question about whether it was also affiliated with the archive.today person. Does anyone know more? I've liked...
I've been using ghostarchive since the first post about this but there was some question about whether it was also affiliated with the archive.today person. Does anyone know more? I've liked ghostarchive so far (unfortunately not as much as archive.today but ah well)
From the article:
[...]
[...]
Woof. I understand the decision but that’s potentially a LOT of valuable archived data that will end up being discarded. Hopefully most of it can be safely re-archived elsewhere but I doubt that will be possible for a significant portion.
Until now I hadn’t heard the accusations of archive.today tampering with saved page content. I want to give it the benefit of the doubt because it’s always seemed trustworthy to me. But without knowing the operators and their agendas who can say for sure (and this DDoS situation does not inspire much confidence). If a site is no longer online there’s no way to guarantee the authenticity of the version archive.today is serving. Honestly the same could be said for any of the alternatives though… we believe that the Wayback Machine is preserving content in good faith, but they could be selectively modifying specific URLs and no one would be the wiser.
I don’t think page archival is a solved problem yet. The strengths of archive.today revealed the shortcomings of Wayback. There are a number of improvements I’d like to see, personally. After reading this I think some kind of cryptographic signature scheme for pages might be helpful too. It could at least prove that archives haven’t been changed over time, but you’d still have to trust the accuracy of the original scrape. Though In the LLM era that might not be enough either. 🤔
Hopefully cleverer minds than mine are already thinking about this.
Well said.
I am concerned that we are going to see a massive cascade of websites blocking them now. I have never had any access issues with archive working on any website, whereas the alternatives can be spotty.
I don't think the operator should be given the benefit of the doubt here, they have admitted as much in a blog post
Tildes users regularly provide "archives" using this site. Should we consider switching elsewhere?
Here's the previous discussion from the first thread about this, for continuity's sake. Wayback Machine and ghostarchive seem to be the main two mentioned there.
I've been using ghostarchive since the first post about this but there was some question about whether it was also affiliated with the archive.today person. Does anyone know more? I've liked ghostarchive so far (unfortunately not as much as archive.today but ah well)
I don't know a good solution. I'm glad that I regularly extract quotes, so at least there's some context. Maybe I'll start archiving pages privately?
Is there any connection between this and Wikipedia being unreachable a few hours ago?