75 votes

Tumblr to begin selling user content to AI generative service companies, opt-out will be per blog

38 comments

  1. [12]
    drannex
    (edited )
    Link
    I have a lot of feelings that I am going to briefly summarize here: this is fucked. Tumblr, in the way it works, means that every reblog is essentially its own post. I can barely believe that...
    • Exemplary

    I have a lot of feelings that I am going to briefly summarize here: this is fucked.

    Tumblr, in the way it works, means that every reblog is essentially its own post. I can barely believe that "opting out", which is absolute bullshit, is going to restrict any data from being sent. A reblog is a copy, which means even if you opt out, the reblogged content likely isn't, and considering the absolute farce that is the way tumblr backend works (mainly from 15+ years of hotfixes) I have absolutely zero belief that opting out will work and that they can remove reblogged content from posts in a way that is substantially accurate.

    The next issue is simple: If you opt out later, your data has already been packaged and sold off to the highest bidder and there will be nothing you can do about it. The way I've seen the phrasing on this has been that they have already packaged quite a few posts and sent them off. Once they are submitted, there is absolutely no way to remove them from the blackbox of data aggregation once the weights have been created from them (that's a very minimal explanation of how companies like OpenAI and Midjourney have their technology working. Once something is added, it can't or won't be removed).

    The other issue: deactivated, terminated, or long ago blogs from users past can not, will not, and would not be able to opt out of this. That is the vast majority of the ecosystem.

    This toggle is worthless. They know that. Automattic knew this was going to be a horrific decision from the user base and pressed forward anyway. They are going for a cash grab that has a one time payoff and horrific longterm dividends. This pays them once a large sum, then nothing for practically infinite data usage.

    They've had a horrific past few months between the mass layoffs, this obvious intellectual property sell off, the CEO of Tumblr going absolutely insane over trans issues and targeting users, making up lies, and releasing private information from inside a user's account that wasn't public.

    I adored tumblr for what it was, when it was in the best interest of the users, hell, there's a chance that Yahoo (who wanted to turn it into the "next PDF") understood Tumblr better than Automattic ever did.

    I know a lot of previous and current staff members, several who are absolutely pissed about these changes. The enshittification continues of this poor place.

    For the first time in a very long time, I will start reconsidering my usage of [tumblr] and may migrate elsewhere (or nowhere). I have been on tumblr since 2009, and have amassed well over a million followers in that time on the platform.

    75 votes
    1. paris
      Link Parent
      I’ve been on tumblr for almost as long as you, and today I’m downloading the many gig file of my main blog’s backup. I’m done. That this is opt-out and not opt-in is already sleazy, and the fact...

      I’ve been on tumblr for almost as long as you, and today I’m downloading the many gig file of my main blog’s backup.
      I’m done. That this is opt-out and not opt-in is already sleazy, and the fact that tumblr themselves don’t seem to know if opting out will mean OpenAI will or won’t have access to your data anyway is just icing on the cake. Additionally, there’s no real reason to even believe these opt-out requests: they’re to “discourage” but there’s no guarantee whatsoever that that will do anything at all! And what about if you opt-out after the cutoff? “Oh, we talked to OpenAI and we’re pretty sure they’ll delete the data from the scrape,” ok. Ok. That’s definitely how that works.

      And yes, after photomatt’s breakdown (he called the FBI?????) I want nothing to do with yet another site being enshittified into the ground by a cranky tech bro who doesn’t even understand the platform he owns.

      32 votes
    2. [5]
      eve
      Link Parent
      This right here. I am just... so, so tired of the internet right now. Tumblr was a last bastion of sorts for me. I dropped it when they banned porn, and came back because everywhere else was shit....

      For the first time in a very long time, I will start reconsidering my usage of [tumblr] and may migrate elsewhere (or nowhere).

      This right here. I am just... so, so tired of the internet right now. Tumblr was a last bastion of sorts for me. I dropped it when they banned porn, and came back because everywhere else was shit. I've found myself using social media platforms less and less over the years. There's so many accounts to keep track of, so many fucking emails, and now, with this bullshit, I may just pull the plug on posting things. It makes me sad for a lot of reasons that tumblr is doing this. It just hurts.

      24 votes
      1. [3]
        Promonk
        Link Parent
        The porn thing was really your first warning that the place was dying, not because the Internet needs another place for porn, nor because it meant fewer unique visitors, but because it's a sure...

        The porn thing was really your first warning that the place was dying, not because the Internet needs another place for porn, nor because it meant fewer unique visitors, but because it's a sure sign that the service is getting prettied up to be sold off.

        We all know by now how that goes. I can't think of a single community-driven site that's ever been IPO'd or sold wholesale that's ever survived the procedure intact. Inevitably, the enshittification takes it, or else it's just stripped for assets and taken behind the shed purposely.

        21 votes
        1. [2]
          raze2012
          Link Parent
          Reddit might be the first. It was a husk of itself long before the IPO news, after all. So technically it would survive intact. I can't really imagine any other features making the site worse...

          I can't think of a single community-driven site that's ever been IPO'd or sold wholesale that's ever survived the procedure intact. I

          Reddit might be the first. It was a husk of itself long before the IPO news, after all. So technically it would survive intact. I can't really imagine any other features making the site worse between the botting problem, hostile moderation/admin relations, paid incentive to blogspam, rampant bigotry and astroturfing, and now selling off its data to AI (with no attempt to pretend users can opt out).

          ...Then again, it still does technically have porn (they've done a lot to try and make sure it's nigh invisible to most users though), so I guess there's one final straw to play there.

          6 votes
          1. Promonk
            Link Parent
            I actually had Reddit in mind from the jump. I think a lot of what went wrong on the platform was due to attempts to get Reddit into a place where it would turn a profit. Mind you, I didn't say...

            I actually had Reddit in mind from the jump. I think a lot of what went wrong on the platform was due to attempts to get Reddit into a place where it would turn a profit. Mind you, I didn't say just IPOs, I mentioned selling the shop wholesale. I should've said "inviting in venture capitalists or corporate investors," such as Conde Nast.

            The people who've run Reddit through the years have never really understood the power of the platform, and so their attempts to make it profitable have always made usability on the site much worse.

            7 votes
      2. [2]
        Comment deleted by author
        Link Parent
        1. eve
          Link Parent
          Mostly the people and the ridiculous sense of humor. I've been on and off tumblr since 2011 give or take so I've seen a lot of changes. But the absolutely weird brand of humor has been there the...

          Mostly the people and the ridiculous sense of humor. I've been on and off tumblr since 2011 give or take so I've seen a lot of changes. But the absolutely weird brand of humor has been there the whole time and that with some of the niche community aspects and breadth of things you can find, I really enjoy it. For me, it's less about it being a blogging service and more the userbase it has accumulated.

          15 votes
    3. public
      Link Parent
      Even if it were opt-in, what’s to stop some jokester from reblogging a silly amount of nonsense, then opting themselves in?

      Even if it were opt-in, what’s to stop some jokester from reblogging a silly amount of nonsense, then opting themselves in?

      6 votes
    4. [4]
      ibuprofen
      Link Parent
      I think I'm missing something here. As far as a free platform monetizing their userbase goes, AI training data seems one of the most benign options out there. It's completely depersonalized....

      I think I'm missing something here.

      As far as a free platform monetizing their userbase goes, AI training data seems one of the most benign options out there. It's completely depersonalized. There's no tracking or profile-building. How is the user experience being enshittified?

      5 votes
      1. [3]
        drannex
        Link Parent
        I am personally not a fan of a platform relying on mining user content from the past fifteen years to aide other companies that 1) could have a data breach 2) will be harboring personally...

        I am personally not a fan of a platform relying on mining user content from the past fifteen years to aide other companies that 1) could have a data breach 2) will be harboring personally identifying information en mass 3) using said data to create a facsimile of the user base to automate said user base content in the future especially when that sort of technology was ne er a worry from the start of the content development.

        It's just not a good trust exercise.

        11 votes
        1. [2]
          ibuprofen
          Link Parent
          Well of course one isn't a fan of it. I totally get that, and of course we always prefer opt-in and upfront choices. But this doesn't seem to be anything that materially changes the platform in...

          I am personally not a fan of a platform relying on mining user content from the past fifteen years to aide other companies

          Well of course one isn't a fan of it. I totally get that, and of course we always prefer opt-in and upfront choices. But this doesn't seem to be anything that materially changes the platform in any way.

          will be harboring personally identifying information en mass

          Is Tumblr full of personally identifying information? I always thought of it as a largely anonymous platform.

          using said data to create a facsimile of the user base to automate said user base content in the future especially when that sort of technology was ne er a worry from the start of the content development.

          Who is looking to automate Tumblr posts? What does a Tumblr AI do to change your ability to express yourself or connect with your friends?

          1. talklittle
            Link Parent
            I think there are two camps (aside from the third, indifferent camp) that feel angry/betrayed about this. One camp may have believed that the company was a steward of their content. They may have...

            I think there are two camps (aside from the third, indifferent camp) that feel angry/betrayed about this.

            One camp may have believed that the company was a steward of their content. They may have given a pass to the monetization methods that existed at the time the content was published (e.g. banner ads), while believing the company would act in the interests of the creators, such as pursuing copyright claims against other parties using the content in an unauthorized way.

            These creators may not have expected the company would claim ownership and resell the content to other parties, who can then use the content in pretty much an untraceable way, any way they want.

            Another camp may have published content in almost the opposite spirit as the first camp, and yet still feel betrayed by the company. They may have posted their content publicly, wanting it to be available to all for almost any purpose, including AI training. However, they may see this sale as a step toward the company gatekeeping who gets to see and use the content, without any input from the content creator. Beginning the process of converting a public park to a walled garden.

            3 votes
  2. [2]
    ZeroGee
    Link
    Web platforms should never have been given the right to exercise ownership over another person's work. They are content holders, not owners.

    Web platforms should never have been given the right to exercise ownership over another person's work.

    They are content holders, not owners.

    38 votes
    1. raze2012
      Link Parent
      Yeah, if this is our future I think it's only a matter of time before Section 230 collapses. I don't see how you can claim to be immune to accountability about user generated content and then...

      Yeah, if this is our future I think it's only a matter of time before Section 230 collapses.

      No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.

      I don't see how you can claim to be immune to accountability about user generated content and then proceed to sell that content for profit (without compensating the user, to boot). Ads were borderline as is, but is ultimately a tangential business from the content. How can someone sell hosted content to 3rd parties and not be considered a "publisher"?

      29 votes
  3. [6]
    feanne
    (edited )
    Link
    Those looking for a tumblr alternative may be interested in Pillowfort. Here's my recent comment on Tildes about it where I discuss pros and cons. TLDR: Purely user-funded, no data harvesting,...

    Those looking for a tumblr alternative may be interested in Pillowfort. Here's my recent comment on Tildes about it where I discuss pros and cons.

    TLDR: Purely user-funded, no data harvesting, chronological feed, anti-gen-AI, cozy creative highly engaged inclusive community.

    Edit: It's a bit slow at the moment because there's a large influx of tumblr users signing up on Pillowfort.

    24 votes
    1. [3]
      drannex
      Link Parent
      There is also cohost.org which is likely the most similar in general vibes and the most technological stable, as an alternative.

      There is also cohost.org which is likely the most similar in general vibes and the most technological stable, as an alternative.

      12 votes
      1. [2]
        Akir
        Link Parent
        I've been pondering getting into cohost since Cathode Ray Dude is there and he tends to write thoughtfully, but I'm generally unenthused about social media sites in general. Is there anyone else...

        I've been pondering getting into cohost since Cathode Ray Dude is there and he tends to write thoughtfully, but I'm generally unenthused about social media sites in general. Is there anyone else there that would make it worth signing up for?

        3 votes
        1. drannex
          Link Parent
          I'm not so keen on participating on social media at all these days, and as Tumblr was really my last choice, may be a chapter that finally closes. I don't really know many people on cohost, just a...

          I'm not so keen on participating on social media at all these days, and as Tumblr was really my last choice, may be a chapter that finally closes.

          I don't really know many people on cohost, just a few friends.

          4 votes
    2. [2]
      danke
      Link Parent
      §6 of the ToS contains the same boilerplate legal language granting Pillowfort the legal right to relicense and redistribute all user-submitted content, including commercialization. Is it possible...

      §6 of the ToS contains the same boilerplate legal language granting Pillowfort the legal right to relicense and redistribute all user-submitted content, including commercialization. Is it possible / are there any examples of sites that have ToS which shield them from liability without granting them the ability to unilaterally decide to sell user data later on, or is faith in the platform owner all we can rely upon?

      9 votes
      1. feanne
        Link Parent
        Tildes' TOS is a good example I think! From the privacy policy:

        Tildes' TOS is a good example I think! From the privacy policy:

        Your information is used exclusively to operate Tildes. This includes providing the functionality of the site, analyzing usage, troubleshooting site issues, and investigating abuse.

        We never sell your information and Tildes does not have advertising.

        Your information is not willingly shared with third parties, but we may disclose your information if we believe it is necessary to comply with a valid legal process or to prevent imminent harm (such as suicide).

        7 votes
  4. [2]
    redshift
    Link
    Same problem with WordPress. If you have a wordpress.com blog, make sure to opt out of third-party data collection. Their instructions:...

    Same problem with WordPress. If you have a wordpress.com blog, make sure to opt out of third-party data collection.

    Their instructions: https://wordpress.com/blog/2024/02/27/more-control-over-the-content-you-share/

    Short version: Settings → General → Privacy → enable "Prevent third-party data sharing"

    18 votes
    1. winther
      Link Parent
      And even if you are on a paid plan it is still opt in by default. Not cool!

      And even if you are on a paid plan it is still opt in by default. Not cool!

      10 votes
  5. [13]
    balooga
    Link
    I don’t understand who’s paying sites for content when it can be acquired for free via scraping. Anybody can crawl Tumblr right now, and no one would be the wiser.

    I don’t understand who’s paying sites for content when it can be acquired for free via scraping. Anybody can crawl Tumblr right now, and no one would be the wiser.

    11 votes
    1. [2]
      stu2b50
      Link Parent
      It's safer. Companies like to dot their Is and cross their Ts. It's like how MST3k always contacted the copyright holders for permission, even though what they were doing arguably fell under fair use.

      It's safer. Companies like to dot their Is and cross their Ts. It's like how MST3k always contacted the copyright holders for permission, even though what they were doing arguably fell under fair use.

      19 votes
      1. drannex
        Link Parent
        Not only that but scraping is hard, especially on such a mass scale, and extremely expensive and full of duplicate content that has to be carefully sifted through. This is more akin to getting the...

        Not only that but scraping is hard, especially on such a mass scale, and extremely expensive and full of duplicate content that has to be carefully sifted through.

        This is more akin to getting the data from a direct water pipe, instead of the waste water in a sewer.

        21 votes
    2. [3]
      OBLIVIATER
      Link Parent
      Scraping is expensive, direct access to the API is much more scalable and cheaper.

      Scraping is expensive, direct access to the API is much more scalable and cheaper.

      13 votes
      1. [2]
        Ephemere
        Link Parent
        I think it's safe to assume that the body of tumblr posts are already a part of AI training data, just now tumblr gets paid and the openAIs, googles and microsofts of the world get to put an...

        I think it's safe to assume that the body of tumblr posts are already a part of AI training data, just now tumblr gets paid and the openAIs, googles and microsofts of the world get to put an 'ethically sourced' sticker on the next version of their services.

        2 votes
        1. winther
          Link Parent
          Tumblr might have additional metadata not available from public scraping. Like geolocation, age or gender on users. All valuable when using the data.

          Tumblr might have additional metadata not available from public scraping. Like geolocation, age or gender on users. All valuable when using the data.

          2 votes
    3. [6]
      ChingShih
      Link Parent
      I'm curious about this too. Are companies securing their rights to future posts by laying these claims? Do they think that websites are soon going to put walls up, either legally or some kind of...

      I'm curious about this too. Are companies securing their rights to future posts by laying these claims? Do they think that websites are soon going to put walls up, either legally or some kind of robust robots.txt that will prevent scraping? I'd hate to see people having their work scraped by for-profit companies' training models, but I'd also hate to see the internet become compartmentalized in some dystopian way.

      8 votes
      1. [4]
        balooga
        Link Parent
        That was also my thought process. Sites may be doing this even if they have no prospective buyers, as a means of staking a claim on the content. Sort of like how copyright holders have to...

        That was also my thought process. Sites may be doing this even if they have no prospective buyers, as a means of staking a claim on the content. Sort of like how copyright holders have to aggressively assert their IP otherwise they weaken their own legal claims against infringers. Sites like Tumblr and Reddit might be doing this more as a preventative measure than as an attempt to actually sell anything (though I guess, if they do sell any that’s just gravy for them).

        What you’re suggesting as the longer-term outcome of this trend is frightening. The free and open web has been threatened before but this anti-scraping hysteria stands a chance of delivering the finishing blow. And frankly I don’t think it’s going to slow down the maturation of LLMs anyway. It will just further trash the everyday user experience of the internet with no real upside.

        I still stand by the argument that training AI on publicly available content, without anyone’s permission, is not plagiarism or copyright infringement or anything other than a commendable culmination of one of the great promises of the early web: the unfettered, anarchic, emergent organization of the world’s collected information. I’m all for it.

        9 votes
        1. [3]
          ThrowdoBaggins
          Link Parent
          I think deep down I probably agree with you, but I still feel icky that these companies are taking information that’s freely available, packaging it into a bundle, and then selling it for their...

          I still stand by the argument that training AI on publicly available content, without anyone’s permission, is not plagiarism or copyright infringement or anything other than a commendable culmination of one of the great promises of the early web

          I think deep down I probably agree with you, but I still feel icky that these companies are taking information that’s freely available, packaging it into a bundle, and then selling it for their own profit and putting up guardrails against others being able to use it in their own ways.

          If LLMs are made from publicly available data without seeking permission or paying copyright holders, they should be made available to people for free too. Or at least the data packages they use to train should be made available, so that I can make and tinker with my own LLM (if I have access to the hardware to train them)

          4 votes
          1. feanne
            Link Parent
            This. Corporations taking from the commons should give back to the commons. The commons should primarily benefit the public, not shareholders. Information law scholar Ben Sobel has warned that...

            If LLMs are made from publicly available data without seeking permission or paying copyright holders, they should be made available to people for free too.

            This. Corporations taking from the commons should give back to the commons. The commons should primarily benefit the public, not shareholders.

            Information law scholar Ben Sobel has warned that fair use in the digital economy "no longer redistributes wealth from incumbents to the public; it shifts wealth in the other direction, from the public to powerful companies."

            6 votes
          2. DefinitelyNotAFae
            Link Parent
            It's almost like someone taking someone else's fanfic and binding it as a physical book and selling it on Etsy.

            but I still feel icky that these companies are taking information that’s freely available, packaging it into a bundle, and then selling it for their own profit and putting up guardrails against others being able to use it in their own ways.

            It's almost like someone taking someone else's fanfic and binding it as a physical book and selling it on Etsy.

            5 votes
      2. raze2012
        Link Parent
        (tagging @balooga as well for interest) it's a few factors: it's a contingency. AI is under legal fire for doing exactly that and it's not a surefire win this time. These proceedings take years,...

        (tagging @balooga as well for interest)

        it's a few factors:

        1. it's a contingency. AI is under legal fire for doing exactly that and it's not a surefire win this time. These proceedings take years, so if they wait until the lawsuit concludes, it can be too late
        2. these kinds of deals always come with support. You can make your own scraper, but processing that much information is expensive. It's much better to have the company itself make tools to let the AI companies grab, filter, sort, and more with the data instead of make a 3rd party tool prone to throttling or yet more legal issues (web scraping is still a grey area). And any issues with said features get leveraged to the company instead of your own engineers
        3. It can in fact be cheaper at the end. These sites aren't exactly profitable these days, and the engineers at AI companies are probably top dollar. If reddit is getting $60m a year, I can't see Tumblr going for more than $10m. $10m/year can be a team of 20-50 engineers, so offloading that to just procuring the data saves time and money over maintaining a robust scraper.

        I'd also hate to see the internet become compartmentalized in some dystopian way.

        with current social media you can argue that we already are compartmentalized. I wouldn't be surprised if 90% of all organic (micro)blogging happens mostly on Tiktok/Facebook/Twitter and a few other top websites. Some of these already aren't particularly open to begin with, so this may simply be accelerating an inevtiable future.

        9 votes
    4. sparksbet
      Link Parent
      I assume Tumblr is offering some way to filter or curate the content -- in part because I can't imagine it being useful as training data otherwise, because Tumblr's format and culture would make...

      I assume Tumblr is offering some way to filter or curate the content -- in part because I can't imagine it being useful as training data otherwise, because Tumblr's format and culture would make it absurdly messy bad data without that. Scraping tumblr directly would produce so much absolute garbage that it would be very difficult to get anything particularly useful out of it imo.

      5 votes
  6. [3]
    teaearlgraycold
    Link
    I won’t be surprised to see a future opt out from having everything you type on your Android keyboard sucked up for training AIs.

    I won’t be surprised to see a future opt out from having everything you type on your Android keyboard sucked up for training AIs.

    8 votes
    1. [2]
      ThrowdoBaggins
      Link Parent
      Just to be extra cynical… why would Google even give you an option to opt out of that? They can just slap another clause in their privacy policy and say “yeah we’re collecting the way you type,...

      Just to be extra cynical… why would Google even give you an option to opt out of that? They can just slap another clause in their privacy policy and say “yeah we’re collecting the way you type, and no you can’t tell us not to…”

      2 votes
      1. teaearlgraycold
        Link Parent
        Opt out is step one! Step two is no choice. Gotta boil the frog slowly.

        Opt out is step one! Step two is no choice. Gotta boil the frog slowly.

        4 votes