71 votes

archive.today is directing a DDOS attack against my blog

37 comments

  1. [9]
    cfabbro
    (edited )
    Link
    Related Ars article: Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site And here is the Wikipedia request for comment page:...

    Related Ars article:
    Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site

    And here is the Wikipedia request for comment page:
    https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC_5

    Background

    Archive.today, also known as archive.is, is an archiving service similar to sites like the Internet Archive. Archive.today uses advanced scraping methods, and is generally considered more reliable than the Internet Archive. Due to concerns about botnets, linkspamming, and how the site is run, the community decided to blacklist it in 2013. In 2016, the decision was overturned, and archive.today was removed from the spam blacklist. Over 400,000 pages currently contain over 695,000 links to Archive.today

    In January 2026, the maintainers of Archive.today injected malicious code in order to perform a distributed denial of service attack against a person they were in dispute with. Every time a user encounters the CAPTCHA page, their internet connection is used to attack a certain individual's blog. This obviously raises significant concerns for readers' safety, as well as the long-term stability and integrity of the service. The Javascript code which causes this is still live on the website. However, a significant amount of people also think that mass-removing links to Archive.today may harm verifiability, and that the service is harder to censor than certain other archiving sites. As of 12:31, 9 February 2026 (UTC), the malicious code remains active. Please do not visit the archive without blocking network requests to gyrovague.com to avoid being part of the attack!

    Edit: Here is a response from WMF's Eric Mill:

    I’m Eric Mill, I lead the Product Safety and Integrity team here at WMF. Given the scale and severity of this issue, I wanted to ring in here with a note from WMF to explain our approach, as the English Wikipedia community considers what to do with archive.today and its mirrors.

    To cover the facts first, the RFC summary and discussion do a good job of describing the problem: archive.today, a very useful and highly relied-upon archiving service that has helped Wikipedia content be more verifiable and understandable to readers, is using visitor browsers and network bandwidth to carry out a DDoS attack as part of a dispute with another website owner.

    Despite the publicity their actions have stirred up, Archive.today’s owner has not been deterred from continuing the ongoing DDoS. Their official blog (a redirect from blog.archive.today) has only dug in further, acknowledging the reporting but neither denying nor apologizing for it. As discussed on this RFC, the site’s owner has previously displayed questionable behavior and violated Wikipedia policies; their use of sockpuppets led to archive.today being blacklisted on English Wikipedia for a time, from 2013 to 2016.

    The RFC summary notes the impact of ~400K articles to archive.today, though that doesn’t include the mirrors. For example, archive.is is linked in another 86K articles, archive.ph in another 10K. And this is global and bigger than just English Wikipedia. For example, eswiki, dewiki, jawiki, ptwiki, frwiki are each in the 5 digits of article counts for archive.today (not even counting the other mirrors).

    Our view is that the value to verifiability that the site provides must be weighed against the security risks and violation of the trust of the people who click these links. We (WMF) encourage the English Wikipedia community to carefully weigh the situation before making a decision on this unusual case. For readers to remain relaxed and trusting while using Wikipedia, they should be able to reasonably expect that links on Wikipedia to potentially dangerous websites are rare, and that those that do exist are dealt with quickly once spotted.

    Further, the same actions that make archive.today unsafe may also reduce its usefulness for verifying content on Wikipedia. If the owners are willing to abuse their position to further their goals through malicious code, then it also raises questions about the integrity of the archive it hosts.

    To be clear, our view here isn’t based on who the site owner is, where they’re located, or that they operate pseudonymously. Wikipedia links to both big public institutions and private individuals all the time, routinely extending them that trust as a good-faith participant on the web. For the web to work in that way, it also means reconsidering whether it’s necessary to withdraw trust when it is violated. In our judgment, using unsuspecting site visitors to carry out a DDoS is a violation of that trust.

    We expect that when WMF comments on an RFC like this one expressing real concerns, community members will wonder whether we are saying we’re going to take our own actions. The candid answer to that is that we don’t know yet, and have not made that kind of a decision: given the scale of the issue across multiple wikis, we will learn from the result of this RFC and outreach to other communities that might be impacted. We know that WMF intervention is a big deal, but we also have not ruled it out, given the seriousness of the security concern for people who click the links that appear across many wikis.

    Right now, we just want to get our view – that the utility of these links for verifiability must be weighed against the violation of the trust of people who click these links – out here for the record, and encourage the community to see this issue as seriously as we do.

    https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC_5#WMF_note

    35 votes
    1. [8]
      cfabbro
      Link Parent
      Holy crap, the archive.today/.is site operator seems genuinely unhinged. Their blog: https://archive-is.tumblr.com/

      Holy crap, the archive.today/.is site operator seems genuinely unhinged. Their blog:
      https://archive-is.tumblr.com/

      35 votes
      1. Grumble4681
        Link Parent
        This guy definitely seems to have some level of delusion or something going on, and it seems to have been inflamed significantly by justified paranoia that they slipped up badly when they first...

        This guy definitely seems to have some level of delusion or something going on, and it seems to have been inflamed significantly by justified paranoia that they slipped up badly when they first started the site and left a trail of breadcrumbs to their identity, to which the person who operates gyrovague.com neatly picked up various details through internet sleuthing and put it all together nicely into a post. In the archive.today tumblr blog, there is even a post where they say they AI-grok'ed the gyrovague person, so it also sounds like AI helped enable a paranoid and delusional person to come up with a crazy backstory.

        I actually feel for the archive.today owner in the sense that it almost feels like gyrovague operator doxxed them, but at the same time, that could also be considered good journalism. I don't know where the distinction is, like if gyrovague wrote an article about Grumble4681 on Tildes and then linked it to my identity, I'd be upset about that and I think it would be a moral and ethical violation, but I also don't operate a site on the scale that archive.today works at.

        Furthermore, it's not like this information was apparently that hard to find, gyrovague laid out how they came across all the information, so it's not like anyone else couldn't also have found out archive.today's identity, especially law enforcement. So to put myself in those shoes again, if I had made a super simple slip up when I first joined that could link people to my real identity, and then gyrovague writes about it, well anyone could have found it without much effort, and they would get no attention because I'm not noteworthy enough for anyone to care, so is there still something wrong with them doing that? I don't know.

        Of course, most of that is irrelevant in the face of the actions archive.today is taking by leveraging visitors to their site into a DDoS attack, which is an egregious abuse of their influence and trust to whatever extent they had some and it's not at all effective at accomplishing what they would hope and just isn't justified. But because this is so clear cut to me, the more interesting part is examining why archive.today owner is targeting them. They seem paranoid and panicked about their identity being exposed, and the blog post in question has a very neatly framed story that is easy for other media outlets to cite. But short of asking the person to respectfully take down the blog post and then hope it goes away, nothing they could do was ever going to unring that bell, certainly not a DDoS attack. It's the old adage of what goes on the internet stays on the internet. Even though through the years we've found that not always to be true, it's just a good way to frame a perspective on how you should view and interact with the internet when deciding what to share on it. And one would think the owner of an archival site that doesn't remove things themselves should be more familiar with this concept than most.

        28 votes
      2. [6]
        chocobean
        Link Parent
        If a vast section of the internet relies on the free goodwill of one individual, I guess we have to expect some kind of weirdness about it. It's like borrowing the truck from that one singular...

        If a vast section of the internet relies on the free goodwill of one individual, I guess we have to expect some kind of weirdness about it. It's like borrowing the truck from that one singular neighbour so you can get to work.

        I would very much like a public (yes government) option that is free and accountable to a wider body of constituents. But I guess that would run counter to business interests within that region.

        20 votes
        1. [5]
          stu2b50
          Link Parent
          …I think I’d rather have the archive.is guy running it than Donald Trump.

          …I think I’d rather have the archive.is guy running it than Donald Trump.

          18 votes
          1. [3]
            chocobean
            Link Parent
            Ughhhh gross, I was assuming a sane government.... But yeah .... Public non government, so open source non profit?

            Ughhhh gross, I was assuming a sane government.... But yeah .... Public non government, so open source non profit?

            10 votes
            1. [2]
              vord
              Link Parent
              This is a mildly relevant tangent. My local city's cameras are all owned and operated by a 501c non-profit NGO, creating a strict firewall between law enforcement and survielance data. The footage...

              This is a mildly relevant tangent.

              My local city's cameras are all owned and operated by a 501c non-profit NGO, creating a strict firewall between law enforcement and survielance data. The footage is deleted within a month unless a judge-signed subponea has been submitted.

              This is the kind of approach that needs to be taken for most things: A publically accountable organization whose only incentive is to serve the public with their stated goal to their best ability.

              25 votes
              1. chocobean
                Link Parent
                and that's what a government should be: for the people, by the people right? But yeah...

                A publically accountable organization whose only incentive is to serve the public with their stated goal to their best ability.

                and that's what a government should be: for the people, by the people right? But yeah...

                17 votes
          2. vord
            Link Parent
            At this point I'll take the archive.is guy, or perhaps a rabid squirrel manging anything over Trump and his handlers/enablers.

            At this point I'll take the archive.is guy, or perhaps a rabid squirrel manging anything over Trump and his handlers/enablers.

            3 votes
  2. [8]
    Protected
    (edited )
    Link
    As someone who has had to deal with denial of service attacks before: This is not a shades of grey kind of issue. Investigative journalism is not a crime. Using other people's computational...

    As someone who has had to deal with denial of service attacks before:

    This is not a shades of grey kind of issue. Investigative journalism is not a crime. Using other people's computational resources to cause financial damage to a third party, in the west, is a crime. A real crime with real consequences, not whatever "copyright infringement" is - or in other words, not a crime in any way deserving of civil disobedience. Your electricity and bandwidth (not to mention your trust) are being abused. This costs you money and can have an impact on your address's and your ISP's reputation. The target's financial impact is proportionally much higher, and such attacks can put people out of business entirely. People who are careless or uncaring about their devices being used as a part of a botnet (because that's what this is) are complicit in causing millions of individuals and small business owners a lot of stress and grief. That's where all the spam you get comes from, for instance.

    So the person running archive.today is a criminal, and if they lived in a western country they would be condemned to spend time in prison for this. This has happened to other such criminals in the past. It's imperative that we all work toward reducing reliance in this service to zero as quickly as possible.

    EDIT: Typo

    26 votes
    1. [3]
      gil
      Link Parent
      Just a small note about this point. In some countries, publishing personal information of people even if it's publicly available, can also be considered a crime:...

      Investigative journalism is not a crime.

      Just a small note about this point. In some countries, publishing personal information of people even if it's publicly available, can also be considered a crime: https://en.wikipedia.org/wiki/Doxing#Legislation

      23 votes
      1. [2]
        Protected
        Link Parent
        That's interesting, thank you for letting me know. I don't have a lot more time right now but glancing through the wikipedia article, I see that in most cases either the information must have been...

        That's interesting, thank you for letting me know.

        I don't have a lot more time right now but glancing through the wikipedia article, I see that in most cases either the information must have been private, or there must be some kind of demonstrable intent to harm or harrass, right?

        Portugal is considered to have very strong privacy laws (AFAIK) but I don't think any penalties would have been enforced against the author here. Instead, the law would have protected the owner of archive.today by making any personal information inadmissible as evidence in court. (Disclaimer: Not A Lawyer!)

        8 votes
        1. sparksbet
          Link Parent
          Doxxing laws vary a ton by jurisdiction, so it'll be hard to make sweeping statements about them in general.

          Doxxing laws vary a ton by jurisdiction, so it'll be hard to make sweeping statements about them in general.

          5 votes
    2. Trobador
      Link Parent
      The criminality of it doesn't make it black or white from a standpoint of morality. Although regardless, there's little moral justification for attempting to DDoS a personal blog.

      This is not a shades of grey kind of issue. Investigative journalism is not a crime.

      The criminality of it doesn't make it black or white from a standpoint of morality.

      Although regardless, there's little moral justification for attempting to DDoS a personal blog.

      11 votes
    3. [3]
      sparksbet
      Link Parent
      I don't condone the DDOS attack whatsoever, but I think it's uncomfortable how much you equate criminality with morality in this comment. It is perfectly possible for a person to do something that...

      I don't condone the DDOS attack whatsoever, but I think it's uncomfortable how much you equate criminality with morality in this comment. It is perfectly possible for a person to do something that is a crime but is also not remotely immoral. It is also possible for a person to be acting completely within the law and still be doing vile, immoral shit. While the criminality is relevant where legal consequences are concerned, this should not be the standard we use for deciding what is moral or not, because it often does not coincide with criminality, whether it does in this case or not.

      And fwiw this isn't apologism-- I don't think the DDOS attack was an appropriate response even if full-on doxxing happened. My opinion of the blog post author would change depending on the extent of the info shared in the blog post (I haven't read it because I don't actually care all that much about what was in it), and I don't super like the language he's used to justify it because it could definitely be used to defend more targeted doxxing, but I don't think the DDOS attack was a reasonable response even in the worst case scenario.

      11 votes
      1. [2]
        Protected
        Link Parent
        Half of my comment was spent explaining exactly why the behavior causes real harm, contrasting it with the typical - morally defensible - copyright infringement which is enabled by archive.today.

        Half of my comment was spent explaining exactly why the behavior causes real harm, contrasting it with the typical - morally defensible - copyright infringement which is enabled by archive.today.

        6 votes
        1. sparksbet
          Link Parent
          I don't think your acknowledgement undoes the overarching emphasis on the criminality of the behavior as being the problem, rather than the problem being the actual morality (or lack thereof) of...

          I don't think your acknowledgement undoes the overarching emphasis on the criminality of the behavior as being the problem, rather than the problem being the actual morality (or lack thereof) of DDOS-ing someone under these circumstances. You contrast it with something like copyright but insofar as you say "this is a REAL crime, unlike stuff like copyright". And. yeah, DDOS-ing is a more serious crime than copyright infringement for sure, but what I'm saying is that it being a crime is irrelevant. It would be equally wrong to DDOS someone under these circumstances even if it were totally legal to do so. There are probably other (very different) contexts in which it wouldn't be immoral to DDOS someone, but it would still be equally illegal. Whether it's a crime does matter when it comes to how dumb it is to do something and the consequences one faces from society, of course, but it's something we should deliberately set aside as irrelevant when discussing morality.

          Equating an action being criminal with it being wrong is an incredibly common and very harmful cognitive shortcut that I try to point out whenever I notice it in thoughtful contexts like Tildes discussions. This is because I think it's an insidious assumption that many people do actually hold, and one that's very easy to unintentionally reinforce even if you don't consciously hold it. As a result, I think even wording that unintentionally plays into this line of thinking merits challenge here.

          2 votes
  3. [11]
    gil
    Link
    I'm probably gonna be alone on this, but what did the author expect after writing a whole investigation trying to dox their identity? I'm of course annoyed that my bandwidth could be used for this...

    I'm probably gonna be alone on this, but what did the author expect after writing a whole investigation trying to dox their identity? I'm of course annoyed that my bandwidth could be used for this purpose. But archive.today is a great service that we get basically for free, but lives on the verge of legality. Its admin would have their live destroyed if their info leaks, so what's the point of the investigation?

    15 votes
    1. [2]
      CannibalisticApple
      Link Parent
      Having read the original blog post that started this whole mess, it doesn't really read like an attempt to dox the site's owner. It reads to me as a writeup of someone getting curious about the...

      Having read the original blog post that started this whole mess, it doesn't really read like an attempt to dox the site's owner. It reads to me as a writeup of someone getting curious about the details of the background and workings of a widely-used online service, and writing down their exploration into the surrounding rabbithole. I've read countless similar write-ups over the years, people share them because it's just interesting to them and figure others will also find it interesting.

      In a vacuum, it actually reads pretty positively to me compared to a lot of such write-ups. It literally ends with "It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee." Without the DDOS attack, I would've found the whole thing pretty neat and gotten some more respect for the creator.

      Meanwhile, the webmaster's responses are seriously just... Crazy. Started with a GDPR complaint, followed by a polite enough email request to remove the article temporarily, and then moved on to DDOS attacks by the next day after not getting a response. And... Was possibly the person to initially bring the DDOS attacks to public attention on Hacker News?? Because the name "rabinovich" is one of the names linked to archive.today, so... It's really weird.

      And then they moved on to attack the character of his grandfather by accusing him of being an ex-Nazi who changed his name in 1944 to hide his Nazi history? And also claim Jani's family must have insisted he not use the family name for the blog's URL because they were ashamed of him and didn't want the blog associated with them?? Seriously, check out the webmaster's Tumblr. It's just spitting a bunch of ridiculous accusations and blatant character attacks.

      14 votes
      1. Grumble4681
        Link Parent
        I get that is how the author of the 2023 blog post framed it, as a curiosity and one of some kind of admiration, but it actually comes across as disingenuous or they're just incredibly ignorant...

        Having read the original blog post that started this whole mess, it doesn't really read like an attempt to dox the site's owner. It reads to me as a writeup of someone getting curious about the details of the background and workings of a widely-used online service, and writing down their exploration into the surrounding rabbithole. I've read countless similar write-ups over the years, people share them because it's just interesting to them and figure others will also find it interesting.

        In a vacuum, it actually reads pretty positively to me compared to a lot of such write-ups. It literally ends with "It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee." Without the DDOS attack, I would've found the whole thing pretty neat and gotten some more respect for the creator.

        I get that is how the author of the 2023 blog post framed it, as a curiosity and one of some kind of admiration, but it actually comes across as disingenuous or they're just incredibly ignorant and unaware which I just don't believe. There's no way you can think that unmasking someone who is doing something obviously illegal ('archiving' paywalled sites and distributing that copyrighted information without authorization) is somehow not harming that person. I can't fathom the level of ignorance it would require for someone to knowingly dive into that thinking that it's just an innocent curiosity on their part that has no repercussions to the person they're attempting to unmask. It's not believable. They even acknowledged that the archive.today is helping people bypass those paywalls in the intro of that 2023 blog post, and they acknowledged that it seems the owner of archive.today is 'mysterious', in effect meaning the owner had made no efforts to reveal their identity to anyone. So to then pretend like it's an innocent curiosity that leads them to reveal the details of who is behind it, it's disingenuous.

        It would have made more sense to me if they just said they were getting into investigative journalism or such rather than feign admiration and curiosity for someone while in the same breath taking actions to harm that person. I do believe there is an argument that it's ethically or morally protected to publish the information they did publish if they were doing it in the role of an investigative journalist rather than just boil their actions down to 'doxxing', but it can be hard to distinguish where doxxing is wrong and where journalism begins. Imagine if they had uncovered that archive.today was actually owned by Elon Musk or that it's run by the Chinese government or something, I think that would be valuable public information and a public good. If you're an ardent supporter of existing copyright legislation and protections, you might also perceive it as a public good to unmask whoever is violating copyright laws running the site. I don't necessarily have a strong opinion on the actual information they published, but by framing it the way they did, they also framed their publishing of that information more as doxxing than as journalism. You can't be on the side of someone that you're obviously harming.

        11 votes
    2. [4]
      Grumble4681
      Link Parent
      Yeah I don't get what their motive was or what they were hoping to accomplish by that either. I personally don't think it was right for them to essentially dox the identity of archive.today owner,...

      Yeah I don't get what their motive was or what they were hoping to accomplish by that either. I personally don't think it was right for them to essentially dox the identity of archive.today owner, but they also laid out how they came across the information and it apparently wasn't that difficult. Just about anyone could have found it if they were determined enough, in fact the part about an F-Secure forum post where the person is talking about the site they own is a fairly common slip-up that can catch a lot of people, I'm pretty sure I remember that being behind the Silk Road guy and a stack exchange post or something like that. When you first start something you don't know it's going to be what it becomes and don't necessarily think about the privacy/security component of it at that time until after it's too late. So law enforcement definitely could have found that if they had tried.

      Of course that doesn't justify ddosing them, and whats more, it's fairly obvious from an outside view that ddosing would never accomplish what they want. Even if they could permanently take down gyrovague through DDoS, that information was already republished elsewhere. There's no going back.

      I feel for someone who was effectively doing a public service all on their own and this gyrovague blogger made it easier for others to magnify the exposure of the identity of this person, but they messed up when starting their site and left an easy trail of crumbs to their identity and that's just the unfortunate reality of it. In the end, that's going to do them in no matter what other blog posts are out there or not.

      12 votes
      1. mild_takes
        Link Parent
        Snippet from the Ross Ulbricht Wikipedia article under "arrest and trial". I find the whole story of his arrest pretty interesting. Also the fact that he got pardoned.

        I'm pretty sure I remember that being behind the Silk Road guy and a stack exchange post or something like that.

        Snippet from the Ross Ulbricht Wikipedia article under "arrest and trial".

        The connection was made by linking the username "altoid", used during Silk Road's early days to announce the website, and a forum post in which Ulbricht, posting under the nickname "altoid", asked for programming help and gave his email address, which contained his full name

        I find the whole story of his arrest pretty interesting. Also the fact that he got pardoned.

        8 votes
      2. [2]
        riQQ
        Link Parent
        This sent my down a rabbit hole. Here's the Stack Overflow question: https://stackoverflow.com/questions/15445285/how-can-i-connect-to-a-tor-hidden-service-using-curl-in-php Found via...

        This sent my down a rabbit hole. Here's the Stack Overflow question:
        https://stackoverflow.com/questions/15445285/how-can-i-connect-to-a-tor-hidden-service-using-curl-in-php

        Found via https://www.reddit.com/r/webdev/comments/1nln17/the_stackoverflow_question_that_busted_the_silk/. The Reddit post's title is exaggerated, the Stack Overflow post was just a piece of the puzzle.

        2 votes
        1. Grumble4681
          Link Parent
          Yeah, it's not as though he fully gave away all the information all in one go as it may have been more the case for archive.today, but the username sharing thing is also another common mistake...

          Yeah, it's not as though he fully gave away all the information all in one go as it may have been more the case for archive.today, but the username sharing thing is also another common mistake that gives people away. That's why when I joined Tildes I used a random username generator, which in the past I didn't always do that and I would use the same username across different services. Then I wouldn't think about or realize that even if I curate the information I provide on each service to not identify me, if I'm linked across services because I used the same username and then all that information is combined it is way more identifiable.

          3 votes
    3. thecakeisalime
      Link Parent
      The author outlines their motive in the linked post: Plus, the post didn't actually dox the owner, because he couldn't find anything.

      The author outlines their motive in the linked post:

      My motives for publishing this have been questioned, sometimes in fanciful ways. The actual rationale is boringly straightforward: I found it curious that we know so little about this widely-used service, so I dug into it, in the same way that previous posts dug into a sketchy crypto coin offering, monetization dark patterns in a popular pay to win game, and the end of subway construction in Japan. That’s it, and it’s also the only post on my blog that references archive.today.

      Plus, the post didn't actually dox the owner, because he couldn't find anything.

      7 votes
    4. [3]
      stu2b50
      Link Parent
      Sounds like they want attention on the issue, so that public pressure may cause the archive person to stop, as well as further investigation by other people to make them stop. All in all, seems...

      Sounds like they want attention on the issue, so that public pressure may cause the archive person to stop, as well as further investigation by other people to make them stop.

      All in all, seems fair. If they didn’t want the heat, they shouldn’t have DDoS’d this guy.

      Its admin would have their live destroyed if their info leaks, so what's the point of the investigation?

      To find out why they’re being DDoSed and to get them to stop. Whether or not their life is destroyed at the end is not their problem.

      4 votes
      1. Grumble4681
        Link Parent
        The parent comment is not referring to the blog post linked in this tildes post, it's referring to an old blog post from 2023 where the operator of gyrovague.com did some internet sleuthing about...

        Sounds like they want attention on the issue, so that public pressure may cause the archive person to stop, as well as further investigation by other people to make them stop.

        The parent comment is not referring to the blog post linked in this tildes post, it's referring to an old blog post from 2023 where the operator of gyrovague.com did some internet sleuthing about the identity of archive.today (well before any DDoSing was taking place) and laid out who the person was behind the site.

        Sometime after, this blog post with archive.today's identity in it started getting attention, and then the FBI subpoenad archive.today in late 2025. This seems to be the start of where the owner of archive.today started feeling the heat, and they noticed news articles or something along those lines being posted about their identity and they were all citing the 2023 gyrovague.com blog post.

        So the DDoS was in reaction to the heat they were already feeling and they attributed it to gyrovague for putting the heat on them.

        Now this news of the DDOS reaction as detailed in this blog post is obviously generating even more heat on archive.today.

        12 votes
      2. gil
        Link Parent
        Sorry, I meant the original 2023 "doxing investigation". He argues it was just out of curiosity, which could be true. But now that FBI is after the site owner, I'm sure they are pretty annoyed...

        To find out why they’re being DDoSed and to get them to stop.

        Sorry, I meant the original 2023 "doxing investigation". He argues it was just out of curiosity, which could be true. But now that FBI is after the site owner, I'm sure they are pretty annoyed this info is more easily available and require no OSINT, just Google.

        6 votes
  4. [9]
    kacey
    Link
    It looks like there's some drama around the archive.today/is/ph/etc. author, and some blogger who was looking into them a while back. Perhaps folks here might consider switching to ye olde Wayback...

    It looks like there's some drama around the archive.today/is/ph/etc. author, and some blogger who was looking into them a while back. Perhaps folks here might consider switching to ye olde Wayback Machine?

    9 votes
    1. [8]
      cfabbro
      (edited )
      Link Parent
      As much as I love Internet Archive (I've been a monthly donor for over a decade at this point), it's unfortunately not a very good alternative to archive.today. It's way too f'n slow, and it also...

      As much as I love Internet Archive (I've been a monthly donor for over a decade at this point), it's unfortunately not a very good alternative to archive.today. It's way too f'n slow, and it also isn't designed to get past paywalls, which archive.today excels at.

      Thankfully this blog post actually mentioned another potential alternative, ghostarchive.org, which looks promising (although it also might have some connection to archive.today's founder too?) So I will try using that site from now on, since based on this blog post I don't want to continue supporting archive.today if the allegations against them are true.

      27 votes
      1. [5]
        balooga
        Link Parent
        Is archive.today’s scraping code proprietary? I’ve always thought that’s the sort of thing that should be up on GitHub with thousands of instances running, so one guy doesn’t hold the keys to the...

        Is archive.today’s scraping code proprietary? I’ve always thought that’s the sort of thing that should be up on GitHub with thousands of instances running, so one guy doesn’t hold the keys to the whole thing. I’d never seen the operator’s blog or dug into who he is but it doesn’t surprise me that he’s sketchy… freedom of information advocates often are. This site has existed on the periphery of the respectable web since it came online. I agree that it’s still performing a valuable service, and is implemented better than Wayback (for certain use cases).

        What I’d like to see, personally, is a torrent-style decentralized FOSS ecosystem where anybody can easily scrape a URL and share hosting duties of that archive with the distributed network. Of course that would be slow and not scale very well. And probably wouldn’t be very reliable for long-term storage. I’m not sure if there are practical technical solutions to those concerns, but there’s gotta be a better alternative to a single monolithic site, with undisclosed funding, somehow offering giant data center resources to the whole world for free forever. That’s not exactly sustainable either.

        7 votes
        1. xk3
          Link Parent
          Individual html files aren't that big--especially if you can compress and deduplicate at the filesystem level. If you skip large files you can archive millions of pages per TB. So while any amount...

          giant data center resources

          Individual html files aren't that big--especially if you can compress and deduplicate at the filesystem level. If you skip large files you can archive millions of pages per TB.

          So while any amount of this type of charity work is generous, I don't think you should feel it is impossible to get started--start small with wget2 and archive the sites that are important to you!

          5 votes
        2. gil
          Link Parent
          I tried to find it some years ago to host my own and couldn't. They probably needs to keep it private also to prevent companies from finding ways to block it. IIRC, on their Tumblr it's mentioned...

          I tried to find it some years ago to host my own and couldn't. They probably needs to keep it private also to prevent companies from finding ways to block it. IIRC, on their Tumblr it's mentioned that they keep rotating newspaper paid subscriptions to be able to scrape the whole content, unlike Internet Archive.

          4 votes
        3. kacey
          Link Parent
          I couldn't find the source code when I did a quick search the other day? Some Hacker News folks had done a bit of looking a few years ago, too, but hadn't turned anything up.

          Is archive.today’s scraping code proprietary?

          I couldn't find the source code when I did a quick search the other day? Some Hacker News folks had done a bit of looking a few years ago, too, but hadn't turned anything up.

          2 votes
        4. vord
          Link Parent
          IPFS + Wayback Cannot speak to effectiveness, esp since IPFS has some serious flaws...but what doesn't.

          a torrent-style decentralized FOSS ecosystem

          IPFS + Wayback

          Cannot speak to effectiveness, esp since IPFS has some serious flaws...but what doesn't.

          1 vote
      2. [2]
        Grumble4681
        Link Parent
        I'm curious if you've tried this ghostarchive.org site yet and had any success with it because I just tried using it with an LA Times article and the main page of the site loads but it won't load...

        I'm curious if you've tried this ghostarchive.org site yet and had any success with it because I just tried using it with an LA Times article and the main page of the site loads but it won't load when searching nor trying to archive a site and just returns 404 not found. Could just be an influx of traffic they weren't prepared for but guess it won't take too long to find out if it's ready to be the replacement for archive.today

        5 votes
        1. cfabbro
          (edited )
          Link Parent
          I submitted a few paywalled articles to ghostarchive yesterday just to test it, and it worked fine then. It failed to display a few images on a NYT article, but all the article text itself was...

          I submitted a few paywalled articles to ghostarchive yesterday just to test it, and it worked fine then. It failed to display a few images on a NYT article, but all the article text itself was mirrored properly. However, I just got a 404 when trying to submit something again now too... so yeah, you're probably right that it's being hammered by too many new requests which it can't handle. Hopefully they can change something to fix that, but only time will tell, I suppose. :/

          10 votes