41 votes

FYI: This site claims to have harvested 4B+ Discord chats, today all yours for a price

12 comments

  1. [8]
    ducc
    Link
    Obviously, this sucks if you value your privacy. Looking around their website, their motives seem suspect too - taking a look at their page to submit a server for scraping, (you have to be logged...

    Internet-scraping outfit Spy.pet claims to have harvested more than four billion public messages made by nearly 620 million users on more than 14,000 Discord chat servers – and is selling access to this trove.

    The service (for a lack of a better word) has been active since November 2023, vacuuming up user and server activity. Yes, all the info is already public in a way – Discord is kinda like IRC on steroids – and it's a reminder that it's not impossible to gather up all this chatter using bots for various purposes (if not surveillance then training AI models.)

    The website presents the data it's collected in several ways. Each known user has a profile, which contains all known aliases, pronouns, connected accounts to other platforms such as Steam and GitHub, Discord servers joined, and public messages. If you wanted to quite literally spy on a Discord user or users, Spy.pet lets you do that, for a fee.

    Obviously, this sucks if you value your privacy. Looking around their website, their motives seem suspect too - taking a look at their page to submit a server for scraping, (you have to be logged in to see it) they want "interesting servers," including "LGBTBBQ" forums. This seems like it will definitely be used as a tool to doxx people. the not-so-subtle "LGBTBBQ" dog whistle is pretty concerning.

    If you're in any large public servers and value your privacy, it might be a good idea to be cautious - your information is most definitely in this database.

    44 votes
    1. [7]
      DeaconBlue
      (edited )
      Link Parent
      Not just large public ones, even most private ones have various bots running for one reason or another that can harvest data. Hell, a very valid "attack" vector could just as easily be...

      If you're in any large public servers and value your privacy, it might be a good idea to be cautious

      Not just large public ones, even most private ones have various bots running for one reason or another that can harvest data. Hell, a very valid "attack" vector could just as easily be non-maliciously taking over an account and just leaving it idle and listening.

      Edit: as a more general rule, your data is only as private as the weakest listener to a conversation. In a phone call, this means the most gossipy person. In verbal chat, it is the most gossipy person in earshot. Online, it is the most gossipy person/bot on the server or the server itself.

      27 votes
      1. [3]
        stu2b50
        Link Parent
        I remember people being mad when Discord start to lock down bot perms but this is exactly why. I know for large servers user bots will never be eliminated but unfettered access by random bots is...

        I remember people being mad when Discord start to lock down bot perms but this is exactly why. I know for large servers user bots will never be eliminated but unfettered access by random bots is how your 6 person group chat discord server has all its messages scraped because one of them decided to add the random meme bot to the server that can read all messages.

        18 votes
        1. RheingoldRiver
          Link Parent
          ok as a bot developer, the reason I was mad wasn't that they locked down bot perms. It was that to apply for the intents (which our bot needed), you had to wait while they went through a queue...

          I remember people being mad when Discord start to lock down bot perms but this is exactly why

          ok as a bot developer, the reason I was mad wasn't that they locked down bot perms. It was that to apply for the intents (which our bot needed), you had to wait while they went through a queue that took close to a YEAR for them to get through. It was insane. we were locked at 100 servers because they didn't approve our "please verify us" application and we ended up making a 2nd identical copy of our bot. this was a nightmare in terms of code management, needing to deploy every single change twice, and that 2nd bot was private & gated by a patreon account because (a) we wouldve just hit 200 servers and had the same problem otherwise; and (b) we made people pay to access it because we had to spin up another server because discord wasn't progressing through their application process. and being private, we had to manually invite it to each server it got added to.

          The decision was fine, the execution was a nightmare

          16 votes
        2. sparksbet
          Link Parent
          Honestly though someone scraping servers can just as easily use a User bot -- it's against discord's ToS iirc but there's little enforcement and I doubt a group like this cares. It also doesn't...

          Honestly though someone scraping servers can just as easily use a User bot -- it's against discord's ToS iirc but there's little enforcement and I doubt a group like this cares. It also doesn't require any mod to add the bot, they can just join the server through a normal invite link.

          1 vote
      2. [3]
        ducc
        Link Parent
        Right. I've wondered why we haven't really seen more cases of hacked API tokens for bots resulting in nuked / compromised servers - most bots request full admin permissions by default and you have...

        Right. I've wondered why we haven't really seen more cases of hacked API tokens for bots resulting in nuked / compromised servers - most bots request full admin permissions by default and you have to manually edit the invite link to change it.

        In this case specifically though, it's suspected they're using self bots (i.e. regular accounts piloted by a bot) which are joining servers based off of scraped / submitted invite links.

        9 votes
        1. RheingoldRiver
          Link Parent
          this is also discord's goddamn fault for having shitty bot permission management. you have 2 choices: Choice 1: Get only the rights you need. Even if you guess correctly and don't need any...

          most bots request full admin permissions by default and you have to manually edit the invite link to change it.

          this is also discord's goddamn fault for having shitty bot permission management. you have 2 choices:

          Choice 1: Get only the rights you need. Even if you guess correctly and don't need any existing right but the ones you asked for (or if you ask for everything), discord is CONSTANTLY creating new permissions (send in threads, send in forums to name a couple) which always default to OFF for bots. So if you choose this option you will be inundated by user complains "omg help why doesnt the bot work anymore" or you don't have user support and then no one wants to use your bot. The SANE way to do this would be to have a built-in system for bots to request additional rights from every server they're in but that's too hard discord can't build something useful, instead they have to make polls that no one wants (and not make a user permission for that, thanks discord).

          Choice 2: Request admin. That's it, you're done!! Your bot will now have every permission it needs forever, because admin automatically grants every single permission.

          My bot does chooses option 1 and it is a goddamn pain in the ass and we aren't even that big, only like 550 servers, and we're not for profit (just a patreon to cover server costs with a couple small donor-only features, mostly cosmetic). I've been sorely tempted to change it to requesting admin, and I am 0% surprised that this is the route most bots go

          14 votes
        2. puhtahtoe
          Link Parent
          I follow the r/discordapp subreddit and this is actually a semi common thing. Someone posts about their server being wiped or spammed and the answer seems to always be a bot they added for something.

          I've wondered why we haven't really seen more cases of hacked API tokens for bots resulting in nuked / compromised servers

          I follow the r/discordapp subreddit and this is actually a semi common thing. Someone posts about their server being wiped or spammed and the answer seems to always be a bot they added for something.

          2 votes
  2. [3]
    creesch
    (edited )
    Link
    Maybe a bit of a hot take, they are very likely not the first ones to do so just the most public ones. Moving on to an actual hot take, for a lot of the forum style discords I just wish the...

    Maybe a bit of a hot take, they are very likely not the first ones to do so just the most public ones. Moving on to an actual hot take, for a lot of the forum style discords I just wish the indexing wasn't done by such a shady company. When we are just talking chat I can sort of reason it out that there is no need to have that show up in google results or always be public. IRC wasn't and the same has generally been true for most chat type media.

    But now Discord has moved into forum style channels, I do believe it to be harmful for an open internet where you can find information more easily.

    16 votes
    1. [2]
      stu2b50
      Link Parent
      What about Discord is shady? As for its replacement of traditional forms and wikis, it is unfortunate, but it’s hardly Discord’s fault. It’s the users that are choosing to eschew those older...

      What about Discord is shady? As for its replacement of traditional forms and wikis, it is unfortunate, but it’s hardly Discord’s fault. It’s the users that are choosing to eschew those older mediums in favor of jerry rigging a poor equivalent in Discord. It’s a sign that forums and wikis need to evolve UI-wise, if the newer generation would rather finangle 20 discord pins than use your website.

      7 votes
      1. creesch
        (edited )
        Link Parent
        I wasn't referring to discord as shady. I was saying that I don't think discord being indexed is necessarily a bad thing, if it was done by a different company than the shady one mentioned in the...

        I wasn't referring to discord as shady. I was saying that I don't think discord being indexed is necessarily a bad thing, if it was done by a different company than the shady one mentioned in the article.

        The rest of what you said I think is a bit more complex than just saying it comes down to UI and such. A venture capital backed service can offer a lot for "free" to undercut what is already there.

        edit: I also want to make it clear that I believe this in the context of discord servers that are effectively publicly accessible anyway. Not ones that are actually intended to be private.

        15 votes