42 votes

Meta allegedly pirated terabytes of porn to trick the BitTorrent protocol into letting them pirate books faster

21 comments

  1. [18]
    balooga
    Link
    I'm so confused about what torrenting massive amounts of porn has to do with anything else being alleged... I think that quote is probably the bit that explains the scheme, but it's not super...

    I'm so confused about what torrenting massive amounts of porn has to do with anything else being alleged...

    "The only reason to incur the server and bandwidth expense of remaining in a swarm for these long durations is to leverage the extended distribution as tit-for-tat currency in order to efficiently download millions of other files from BitTorrent," Strike 3 Holdings alleged.

    I think that quote is probably the bit that explains the scheme, but it's not super clear to me what it means. Feels like we have to read between the lines to figure it out. Is it because Meta was participating in private trackers, which use seed/leech ratios as a reputational signal? So they seeded all this porn to get the numbers up, then used the reputation boost for some sort of advantage in accessing other content from the network? Am I barking up the right tree here?

    What I really don't get is how any of these connections were traced back to Meta IPs. If they're really trying to be clandestine (and if they're spending as much money on AI catch-up as recent headlines claim) then why didn't they spring for a few Mullvad accounts to pipe this traffic through, or something? Maybe at scale that's too much of a bandwidth bottleneck? Still seems like a technical hurdle they could've easily anticipated and cleared.

    18 votes
    1. [6]
      diskroll
      Link Parent
      Private trackers usually require you to maintain a certain seed/leech ratio; otherwise, you get your access revoked. Often, they will allow you to "freeleech" certain torrents, meaning you can...

      Private trackers usually require you to maintain a certain seed/leech ratio; otherwise, you get your access revoked. Often, they will allow you to "freeleech" certain torrents, meaning you can download them without it counting against your ratio, and you can then seed them to improve your ratio. My guess is that those porn files were probably torrents tagged freeleech, and that's why they downloaded and seeded them, which doesn't directly allow you to download anything else faster, but rather gives you continued access to private trackers, which are generally faster.

      29 votes
      1. [5]
        DefinitelyNotAFae
        Link Parent
        They're also much larger files than books meaning they would get a lot of... Well I was going to say "bang" for their ratio but it feels crass /j But seriously, that's tons of books for one video,...

        They're also much larger files than books meaning they would get a lot of... Well I was going to say "bang" for their ratio but it feels crass /j

        But seriously, that's tons of books for one video, I don't even have the math for it handy.

        The private tracker might have offered more bandwidth to people with exceptional ratios too (or throttled the low ratio folks), it seemed that was an implication in the article.

        ETA It would be hilarious if the single meta employee that was found to have torrented just happened to get busted for his personal downloading

        25 votes
        1. [2]
          diskroll
          Link Parent
          Less tit-for-tat and more tit-for-text, if you will.

          Less tit-for-tat and more tit-for-text, if you will.

          40 votes
        2. [2]
          norb
          Link Parent
          I think assuming they were downloading books only is why it seems weird. But if they also wanted to ingest video or audio, then seeding these types of things makes a lot more sense

          I think assuming they were downloading books only is why it seems weird. But if they also wanted to ingest video or audio, then seeding these types of things makes a lot more sense

          2 votes
          1. DefinitelyNotAFae
            (edited )
            Link Parent
            Maybe, the lawsuit is obviously highlighting the books (due to authors suing) and porn (because the porn producers are suing), so who knows. But the ratio would be exquisite. You could download...

            Maybe, the lawsuit is obviously highlighting the books (due to authors suing) and porn (because the porn producers are suing), so who knows. But the ratio would be exquisite. You could download all the books on the internet and not dip out of those elite speed tiers if you seeded the porn the whole time.

            5 votes
    2. [3]
      LunamareInsanity
      (edited )
      Link Parent
      I was curious about these questions too, so I did some digging. The filing document holds what are probably the answers. re: downloading of porn being said to accelerate downloading of e-books....

      I was curious about these questions too, so I did some digging. The filing document holds what are probably the answers.


      re: downloading of porn being said to accelerate downloading of e-books.

      BitTorrent operates on a “tit for tat” basis where Meta’s seeding (uploading), enables Meta to obtain better download speeds so that it can consume more content, faster.

      The tit-for-tat mechanism within the BitTorrent Protocol rewards users who distribute the most desired content. This ensures that users distribute content on BitTorrent as opposed to solely downloading. If all BitTorrent users were to avoid distribution, then there would be no content available on the network for users desiring to download it.

      Defendant was specifically aware of this issue and, discovery will likely show, is the reason why Defendant elected to continuously distribute Plaintiffs’ content as opposed to just purchasing a subscription or modifying its BitTorrent clients to download only.

      It seems to me like they just... don't understand that there isn't a universal point to ratios.

      Higher ratios of seeding leading to faster download speeds as incentive for seeding sounds plausible. Especially if your thought process is very capitalistic and you're baffled about how torrenting as a culture can survive.

      Edit: Apparently it is me that doesn't understand! There actually is a choking algorithm that does pretty much exactly what is described above. It was apparently very useful back in the olden days when bandwidth was low and has remained in the protocol. I've never encountered it in practice, but maybe it truly is relevant when you're downloading at the scale of ~100TB a pop.

      Also, I doubt Meta was participating in private trackers from just this info alone -- if nothing else, I don't know a private tracker that has both prodigious amounts of books and porn at the same time. Even the generalist trackers are light on porn and books (and heavy on tv shows and movies). Though I am curious why, if the point was books to train models on, they didn't just download that clearnet rip of Bibliotik from a few years back...


      re: traced back to Meta IPs, this one is pretty cut and dry. There were Meta employees torrenting from corporate clearnet IPs or their home residential IPs that got caught. From there, they extrapolated their download patterns (assuming there was a torrenting script handed down by corporate) and found potential datacenter VPN'd IP blocks with the same behaviors.

      Strike 3 conducted an analysis attempting to find Meta’s hidden IP addresses by looking for certain correlations to data patterns that matched infringement patterns seen on Meta’s corporate IP Addresses. These include, but are not limited to, such instances as:

      a. Similar patterns involving mass infringement beyond what a human could consume;

      b. Similar methodical downloads of disparate content based on the patterns shown by Meta’s corporate IP addresses;

      c. Similar content being downloaded on the same day or at or near the same time as on Meta’s corporate IP addresses;

      d. Similar targeting of certain types of content featuring specific languages at or around the same date and time that followed a shifting pattern (i.e. IP addresses that targeted French language versions of TV shows or films on the same day as Meta’s corporate IP addresses and then shifted in apparent connection with Meta’s corporate IP addresses to target Russian language versions of TV shows); and

      e. Correlations to Meta’s corporate IP addresses where the same content is being torrented in different resolutions at or around the same time.

      13 votes
      1. [2]
        JCPhoenix
        Link Parent
        These people have never met datahoarders. Or gooners. Or -- shudders -- datahoarding gooners. Somewhat more seriously, I had a friend/acquaintance who bought a NAS and some HDDs so he could store...

        Similar patterns involving mass infringement beyond what a human could consume;

        These people have never met datahoarders. Or gooners. Or -- shudders -- datahoarding gooners.

        Somewhat more seriously, I had a friend/acquaintance who bought a NAS and some HDDs so he could store all his porn. He told us how he meticulously categorized and organized his collection, through tagging, filename formats, folders structures, etc. And he was quite proud of that while the rest of us were like, "What in the actual fuck, man?" Even his small collection of maybe a couple TB seemed like an insane amount of porn to consume.

        12 votes
        1. CptBluebear
          Link Parent
          I think that's taking a porn habit a bit too far but to their defense: porn or not, 4k videos tend to stack up rapidly. TBs is not shocking to me unless I know it's double or triple digit TBs....

          I think that's taking a porn habit a bit too far but to their defense: porn or not, 4k videos tend to stack up rapidly. TBs is not shocking to me unless I know it's double or triple digit TBs.

          It's always the gooners that use that categorization superpower. Rather than use it for the betterment of humanity they tend to meticulously maintain a porn file structure while I'm stuck downloading desert_raven_144.mp3 from Soulseek and having to manually id3 tag my gigabytes of music files.

          3 votes
    3. [4]
      Greg
      Link Parent
      Yeah, that’s roughly how I understood things, although there’s been a lot of talk and accusations flying about on the whole Meta/AI/piracy topic in general so I could well have missed or...

      Yeah, that’s roughly how I understood things, although there’s been a lot of talk and accusations flying about on the whole Meta/AI/piracy topic in general so I could well have missed or misinterpreted something.

      On the IP addresses side, I think it’s just standard big tech arrogance; move fast and break laws, right? There are some pretty wild quotes from records that have come out in discovery on the book piracy case:

      "Torrenting from a corporate laptop doesn’t feel right," Nikolay Bashlykov, a Meta research engineer, wrote in an April 2023 message, adding a smiley emoji. In the same message, he expressed "concern about using Meta IP addresses 'to load through torrents pirate content.'"

      6 votes
      1. [3]
        balooga
        Link Parent
        If the private tracker theory is right, that means Strike 3 would have also needed to be a participant, in order to have eyes on what Meta was doing. Which would implicate them as pirates also....

        If the private tracker theory is right, that means Strike 3 would have also needed to be a participant, in order to have eyes on what Meta was doing. Which would implicate them as pirates also. That might explain the vague, evasive language being used here.

        3 votes
        1. Greg
          Link Parent
          I did wonder what the reference to Strike 3’s “proprietary BitTorrent tracking tools” means in practical terms, too. How much data do they have, what are they monitoring, how are they accessing...

          I did wonder what the reference to Strike 3’s “proprietary BitTorrent tracking tools” means in practical terms, too. How much data do they have, what are they monitoring, how are they accessing trackers, etc etc.

          Presumably they’re confident enough in their methods to lay them down as evidence against the big guys, at least. Will be interesting to see if that confidence is justified.

          3 votes
        2. norb
          Link Parent
          I wonder if its possible to write a custom torrent client that misidentifies itself (to avoid being banned from the site) and then scrape torrents to monitor who is advertising it? You wouldn't...

          I wonder if its possible to write a custom torrent client that misidentifies itself (to avoid being banned from the site) and then scrape torrents to monitor who is advertising it?

          You wouldn't even necessarily need to download the data to monitor it that way as far as I know.

          I'm also not an expert on the BitTorrent protocol, so maybe that isn't possible for some reason.

          1 vote
    4. [2]
      ButteredToast
      Link Parent
      Are there private trackers that have both massive amounts of porn and massive volumes of books or other media, though? Usually those are focused on a specific niche, e.g. music, TV, etc, partially...

      Are there private trackers that have both massive amounts of porn and massive volumes of books or other media, though? Usually those are focused on a specific niche, e.g. music, TV, etc, partially because the only way to keep one afloat and healthy over long periods is to attract collector-adjacent enthusiasts which have high standards.

      Generalist trackers tend to have loose rules and quality that’s all over the place, which makes them difficult to keep alive since enthusiasts go elsewhere and more casual users grab what they want and bounce, which is why they’re usually public and are sustained by sheer numbers, where throughput somewhat offsets turnover.

      Nothing about this story seems to fit together. Data is far more visible for public trackers which would make more sense for what Strike 3 was seeing, but there’s no impetus for Meta to seed on public trackers. If it’s true that they were using private trackers, the main benefit they’d get from seeding porn is, uhh, the ability to download more porn.

      The only explanation that makes sense to me is that porn was actually Meta’s primary objective and they’re making use of it in model training somehow. The first thing that springs to mind is NSFW detector models but maybe they have a stealth mode adult entertainment project going? I dunno.

      5 votes
      1. diskroll
        Link Parent
        It makes sense if they were downloading other (non-porn) videos to train a video generation model rather than downloading text for an LLM. Edit: there are also general private trackers that would...

        It makes sense if they were downloading other (non-porn) videos to train a video generation model rather than downloading text for an LLM.
        Edit: there are also general private trackers that would carry both porn and ebooks/other text files.

        4 votes
    5. [2]
      CannibalisticApple
      Link Parent
      I was a bit confused too. Here's the summary from DuckDuckGo explaining the tit-for-tat thing: So by my understanding: Meta seeded porn the day it was released to capitalize on interest and have a...

      I was a bit confused too. Here's the summary from DuckDuckGo explaining the tit-for-tat thing:

      Tit-for-tat is a strategy used in BitTorrent that encourages users to share files by rewarding them with faster downloads in return for their uploads. Essentially, the more you upload to others, the quicker you can download from them, promoting cooperation among users.

      So by my understanding: Meta seeded porn the day it was released to capitalize on interest and have a spike in leeching, so they could have faster download speeds. And that would let them download more books in a short time. Maybe. I don't torrent so not sure how this works, honestly.

      I think they might be using this mainly to drum up more publicity and add some weight to the legal argument. "They seeded our content in order to boost their pirating efforts" does sound more malicious and greedy, and may sway a judge or court more than just "stealing our content". Similar logic behind them emphasizing that Meta could have distributed the porn to minors: point out Meta engaged in behavior that explicitly ignores laws relating to that content's distribution.

      2 votes
      1. balooga
        Link Parent
        I appreciate that, I’ve never heard of this tit-for-tat mechanism and just assumed the talk of exploiting the BT protocol was misinformed. Strange that Meta apparently dove deep enough into...

        I appreciate that, I’ve never heard of this tit-for-tat mechanism and just assumed the talk of exploiting the BT protocol was misinformed. Strange that Meta apparently dove deep enough into torrentmaxing to game that, yet still managed to leak their IP addresses like a n00b.

        4 votes
  2. Greg
    Link
    Uh… yeah. The title is the most accurate summary I could give, at least based on what I’ve seen of this story so far. We live in the dumbest timeline.

    Uh… yeah. The title is the most accurate summary I could give, at least based on what I’ve seen of this story so far. We live in the dumbest timeline.

    10 votes
  3. [2]
    gco
    Link
    Is this realistic at all to expect? Even if Meta is found guilty (Not sure if that word exactly applies here), how could they demonstrate they have deleted the videos and retrained their models?...

    The company also wants Meta to delete any stolen videos from its AI training data and existing AI models. The company alleged that Meta could use its high-quality copyrighted works—which provide rare long cuts of "natural, human-centric imagery" showing "parts of the body not found in regular videos" and "unique" forms "of human interactions and facial expressions"—to create a rival adult video generator that could "eventually create identical content for little to no cost."

    Is this realistic at all to expect? Even if Meta is found guilty (Not sure if that word exactly applies here), how could they demonstrate they have deleted the videos and retrained their models? Wouldn't it be better to request royalties or some sort of ongoing compensation? I would also think direct economic impact would be a better deterrent for others trying to get away with something similar.

    4 votes
    1. psi
      Link Parent
      In this case, the plaintiff's demands should be interpreted as a threat, not as a likely (or even desired) outcome. I'd guess the parties will almost certainly settle long before the case goes to...

      In this case, the plaintiff's demands should be interpreted as a threat, not as a likely (or even desired) outcome. I'd guess the parties will almost certainly settle long before the case goes to trial (assuming there isn't a summary judgement against the plaintiffs). If Facebook is capable of spending hundreds of millions of dollars to attract talent, they can also afford to pay some ungodly amount of money to make a lawsuit disappear.

      4 votes