46 votes

Reddit CEO says Microsoft needs to pay to search the site

28 comments

  1. [5]
    mat
    Link
    Gosh I mean I can understand why spez would be mad at a company profiting from the hard work of other people creating content for free and then not paying them for the use of that data. Imagine...

    Gosh I mean I can understand why spez would be mad at a company profiting from the hard work of other people creating content for free and then not paying them for the use of that data. Imagine that. Poor guy.

    149 votes
    1. GunnarRunnar
      Link Parent
      It's kinda interesting that had he realized this role of a guardian earlier, it probably would've just built trust towards the platform whereas now it just seems like he's stealing from the users.

      It's kinda interesting that had he realized this role of a guardian earlier, it probably would've just built trust towards the platform whereas now it just seems like he's stealing from the users.

      37 votes
    2. [3]
      OBLIVIATER
      (edited )
      Link Parent
      Its hilarious how its possible for a company like reddit to be unprofitable without selling personal user data to AI companies considering they literally don't pay for anything. The only major...

      Its hilarious how its possible for a company like reddit to be unprofitable without selling personal user data to AI companies considering they literally don't pay for anything. The only major platform who 99% of their trust and safety actions are done by volunteers. Insane how they somehow manage to have 2,000 employees while the site is functionally worse than it was in 2010.

      24 votes
      1. [2]
        Adys
        Link Parent
        There are no incentives for it to be profitable. I know it sounds counter intuitive but it’s how the investment has been set up. The investors were seeking a significant return (IPO for example),...

        There are no incentives for it to be profitable. I know it sounds counter intuitive but it’s how the investment has been set up.

        The investors were seeking a significant return (IPO for example), not a lifestyle business. So every financial incentive the investors set up for the CEO is going to follow that same route.

        It’s not about profitability. It’s about potential for profitability. That is what drives the exit. Companies don’t usually get bought and sold on current value but rather because the buyer sees a potential way to extract more value out of it than the sticker price. It’s the same once it’s public: most stocks are traded at numbers that price in future value.

        7 votes
        1. OBLIVIATER
          (edited )
          Link Parent
          Yeah but Reddit has never been profitable, like... ever (until very very recently with this IPO and AI data sale,) but not for lack of trying. This was the case all the way back to 2005, long...

          Yeah but Reddit has never been profitable, like... ever (until very very recently with this IPO and AI data sale,) but not for lack of trying.

          This was the case all the way back to 2005, long before an IPO was the goal and since then they have been trying (and failing) to achieve profitably via countless ill thought-out schemes. Anyone else remember the weird "subreddit merch" initiatives, or before that the transition from reddit gold to a tacky and obnoxious "award" system that was abused to high heaven and was eventually removed (and subsequently brought back a year later?), or before that the weird reddit cryptocurrency push, or before that when they tried to turn reddit into a twitter/facebook alternative and create "influencer profiles" that you could post content to and have people "subscribe" to your content for a fee, or before that where they beta tested a system to disguise ads as real reddit posts in real communities with little distinction between genuine posts and sponsored posts, or before that when they implemented a system to automatically inject affiliate marketing links into links on the site in a desperate bid to wring any amount of cash they could out of the community, or before that where they started accepting huge fees from movie studios to promote their upcoming films by doing sponsored video AMAs with celebrities that wouldn't actually answer people's questions but just shill the products they were trying to sell, or before that where.... well you get the point.

          I kinda got off topic with this comment but I had forgotten just how many hair-brained schemes ole spezzyboi and his friends have dreamed up to try and earn them the slightest bit of cash over the years. I'm sure there's even more that I'm not even remembering, I vaguely recall something to do with them trying to break into the livestream market, the local used goods market, and their bizarre and completely unsuccessful attempt to make "reddit NFTs" work.

          All this is to say, Reddit's goal for profitably heavily predates the current investment situation they are in.

          20 votes
  2. [7]
    infpossibilityspace
    Link
    This isn't just a reddit problem and he actually has a point (although expressed terribly). If AI models are being trained on data scraped from other sites, and if those models take away traffic...

    This isn't just a reddit problem and he actually has a point (although expressed terribly). If AI models are being trained on data scraped from other sites, and if those models take away traffic from those sites, what's the endgoal?

    Websites have bills to pay and will shut down without sufficient ad revenue (lots of views) or subscriptions. If they shut down, how do you train the next set of models? Revenue sharing does alleviate this issue.

    That said, it's not clear to me that these models/companies are even vaguely profitable, so how can you share revenue from something that doesn't make money? How can you justify "investing" in bigger models if they aren't making money? (Not to mention the environmental/infrastructure impact of massive electricity usage)

    24 votes
    1. [4]
      bl4kers
      Link Parent
      Here's my attempt to answer your questions just for fun as a thought experiment (not reflective of my own ethics) The endgoal is to see how powerful or interesting they can get one to be, then try...

      Here's my attempt to answer your questions just for fun as a thought experiment (not reflective of my own ethics)

      ...what's the endgoal?

      The endgoal is to see how powerful or interesting they can get one to be, then try to productize it and profit. Until "inbreeding" impacts the product or profit, it's a side effect. Outstanding question: Isn't user-generated content already dwindling? If so, regardless of AI, companies seem incentivized to squeeze all that they can out of the existing stuff while it's still relevant.

      If they shut down, how do you train the next set of models?

      Save a compressed copy while doing the crawling. Websites are public after all. When in a pinch, rely on other existing backup services like the Wayback Machine.

      ...so how can you share revenue from something that doesn't make money?

      Revenue share is a talking point and finger pointing to appease ethical and/or governmental concerns. If actually pursued, it would be (purposefully) minuscule, less than even Spotify.

      How can you justify "investing" in bigger models if they aren't making money?

      Investors are looking to invest. If these have the potential to "be the new internet" or a platform other products or businesses directly rely on, then the investment money will keep flowing in. The risk is worth the potential reward for these folks. There's also the option of convincing investors that only long-term profit matters and underpricing competitors out of business (e.g. the Amazon way)

      11 votes
      1. [3]
        infpossibilityspace
        Link Parent
        Continuing this thought experiment: This is still a limited dataset, what I meant is how do you train a model 5 years from now as new data becomes increasingly scarce? Surely it's in the best...

        Continuing this thought experiment:

        Save a compressed copy while doing the crawling. Websites are public after all. When in a pinch, rely on other existing backup services like the Wayback Machine.

        This is still a limited dataset, what I meant is how do you train a model 5 years from now as new data becomes increasingly scarce? Surely it's in the best interest of these companies to keep these sources alive?

        If these have the potential to "be the new internet" or a platform other products or businesses directly rely on, then the investment money will keep flowing in.

        This is a good point, and I understand the concept you're getting at, but in concrete terms what does "the new internet" actually mean?
        I've seen this a few times and it feels like a hollow reason to keep building stuff without an actual vision/goal.

        For example, automating entry-level jobs seems like a great idea, but that's going to be a problem in 5-10 years when new graduates can't find jobs because the entry-level jobs in their field don't exist.

        1. [2]
          vord
          Link Parent
          It's basically the billion dollar question. It takes a lot longer for people to make content than for AI models to consume it. They've basically already subsumed the entirety of the written word...

          how do you train a model 5 years from now as new data becomes increasingly scarce?

          It's basically the billion dollar question. It takes a lot longer for people to make content than for AI models to consume it. They've basically already subsumed the entirety of the written word in order to get GPT4. Pretty sure they'll be completely tapped out by GPT 6 or 7.

          A fascinating podcast on the matter.

          5 votes
          1. bl4kers
            Link Parent
            Sorry, I haven't listened to the podcast. So maybe it's brought up, but one weird way to approach this would be to make people contribute content via CAPTCHA, similar to what machine learning did...

            Sorry, I haven't listened to the podcast. So maybe it's brought up, but one weird way to approach this would be to make people contribute content via CAPTCHA, similar to what machine learning did with image identification & labelling. No idea how that would work but I guarantee it will be tried lol

            3 votes
    2. [2]
      RobotOverlord525
      Link Parent
      Yeah, this came up on the NYT Hard Fork podcast (YouTube mirror here) a while back and I can't help but be concerned about it. In particular, the hosts (Kevin Roose and Casey Newton) were...

      Yeah, this came up on the NYT Hard Fork podcast (YouTube mirror here) a while back and I can't help but be concerned about it.

      In particular, the hosts (Kevin Roose and Casey Newton) were enormously concerned about Google's then-forthcoming AI Overviews. AI Overviews are a feature that will generate AI summaries at the top of search results pages. (Like an AI-powered version of Featured Snippets, but without drawing from a specific site or directly quoting anything.) This change could significantly impact web traffic, as users might get the information they need directly from these summaries and not click on individual links as much. This reduction in traffic could hurt digital publications that rely heavily on Google searches for ad revenue and audience growth.

      They noted that while Google claims these AI Overviews lead to "more valuable traffic" (Perplexity's CEO used the same stupid term), publishers remain skeptical. The potential drop in traffic, which analysts predict could be as much as 20% to 40%, could be devastating for many digital media businesses.

      Unlike other AI tools where publishers can opt out, they have no such option with Google's AI Overviews. This means they can't exclude their content from being used in these summaries, leaving them with little control over how their information is used.

      Roose and Newton also discussed the broader implications for the internet. If many websites close due to decreased traffic, the quality and quantity of information available online could decline. This could ultimately affect Google's AI, which relies on high-quality information to generate summaries. So, in other words, the doomsday scenario for this is that it creates a feedback loop that annihilates the profitability/sustainability of tons of websites and then also, ironically, kills off the source of very training data that the AI needs.

      Of course, as journalists, they are especially sensitive to this, but the feedback loop they are describing certainly sounds plausible. Featured Snippets already obviated clicking on search results to some extent — this just makes it worse.

      1. infpossibilityspace
        Link Parent
        It seems to me that there isn't any technical control a website can put in place to prevent it, or you risk blocking legitimate views too. The only solution I can imagine is a legal...

        It seems to me that there isn't any technical control a website can put in place to prevent it, or you risk blocking legitimate views too. The only solution I can imagine is a legal policy/contract that explicitly allows or denies scraping data for AI training.

        Maybe that would take the form of making robots.txt a hardline requirement, but these companies have proven they're more than happy to spoof the user agent. Do we need a new digital law that protects websites and makes AI scraping opt-in?

  3. balooga
    Link
    Ticket resolved as Working As Intended. It should be a pain in the ass to subvert the free and open web.

    “Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman said in an interview this week. He specifically named Microsoft, Anthropic, and Perplexity for refusing to negotiate, saying it has been “a real pain in the ass to block these companies.”

    Ticket resolved as Working As Intended. It should be a pain in the ass to subvert the free and open web.

    20 votes
  4. [5]
    skoocda
    Link
    Does Tildes allow free access to Microsoft and other web scrapers? If not, shouldn't we, the working class bleeding our fingers dry for this content farm, be able to partake in the profits?...

    Does Tildes allow free access to Microsoft and other web scrapers? If not, shouldn't we, the working class bleeding our fingers dry for this content farm, be able to partake in the profits?

    Spoiler: yes, Tildes offers "No limits to logged-out browsing", and a "Fully featured API", with some specific, non-enforceable ideas in the robots.txt

    In general, scrapers are welcome if they are collecting data for informational uses
    (such as search engines) and maintain a reasonable rate of scraping.

    Scrapers from SEO/marketing-type services will be blocked. Tildes data is not a
    resource to be mined and sold.

    OpenAI's GPT bot is specifically blocked, but that's just a drop in the pond, isn't it?

    Also, ironically, this post has the Link information disclaimer "This data is scraped automatically and may be incorrect." Does The Verge allow free access to Tildes' scraping needs? Seems it's turtles all the way down.

    14 votes
    1. [3]
      cfabbro
      (edited )
      Link Parent
      I get your point, but FYI, your "Spoilers" are mistaken. Unlike Reddit, Tildes does limit logged out browsers/crawlers/scrapers to viewing only the first page of user profiles, and it has no API....

      I get your point, but FYI, your "Spoilers" are mistaken. Unlike Reddit, Tildes does limit logged out browsers/crawlers/scrapers to viewing only the first page of user profiles, and it has no API. Also, what "profits"? Tildes is a registered non-profit, is entirely donation driven, and doesn't even get enough of those yet to allow Deimos to take a wage so he can work on the site full-time (although basic site operating costs are fully covered, AFAIK).

      20 votes
      1. [2]
        skoocda
        Link Parent
        You're right, and I appreciate the correction. I somehow misread the future plans for a "full featured API" as being already in place. The profits point was meant in jest, but of course there are...

        You're right, and I appreciate the correction. I somehow misread the future plans for a "full featured API" as being already in place.

        The profits point was meant in jest, but of course there are real profits to be had by 3rd parties who scrape sites. As a non-profit, Tildes could choose to meet operating costs by charging for LLM-scale data access... but that has second order effects. By changing the nature of the income stream, it applies pressure on future dev priorities.

        My comment was likely overly glib and inflammatory, but I've seen how revenue streams affect web businesses over long timelines, and it's certainly worth considering, as a community, how we feel about it.

        7 votes
        1. cfabbro
          (edited )
          Link Parent
          No worries, and no problem. Preaching to the choir, BTW. :P From an old comment of mine way back in 2018 about relying on advertising money, but which applies equally to taking VC and AI company...

          No worries, and no problem. Preaching to the choir, BTW. :P From an old comment of mine way back in 2018 about relying on advertising money, but which applies equally to taking VC and AI company money too:

          Advertising still degrades user experience no matter how you handle it, display it or incorporate it into your site. There is no getting around that and one of Tildes founding principles is catering to its users, not outside influences like advertisers.

          Once you start accepting advertising you become reliant on them which gives them power to negotiate and force you to decide between complying with their wishes and undermining your founding principles or potentially going out of business/needing to lay people off.

          16 votes
  5. [5]
    stu2b50
    Link
    Seems fair to me. A site can close off its data if it wishes, and it can sell its data if it wishes. Microsoft is more than capable of paying Reddit if it wants Bing to have Reddit searches....

    Seems fair to me. A site can close off its data if it wishes, and it can sell its data if it wishes. Microsoft is more than capable of paying Reddit if it wants Bing to have Reddit searches.

    Honestly, Huffman has done a surprisingly good job as CEO these last few years.

    5 votes
    1. [4]
      Tmbreen
      Link Parent
      I agree on the first part, Microsoft should have to pay if they want to train AI on Reddit content, though I believe that payment should go to the creators of that content, or at the least pay for...

      I agree on the first part, Microsoft should have to pay if they want to train AI on Reddit content, though I believe that payment should go to the creators of that content, or at the least pay for web hosting and server maintenance to run Reddit.

      Could disagree more on the second part. I was an avid user of Reddit is Fun, and after that app shut down, have used Reddit less and less as it has gotten worse and worse

      17 votes
      1. hungariantoast
        Link Parent
        Yes but $46 to $59 = good job as CEO Even before IPO company revenue go up = good job as CEO Of course I actually agree with you, but there are people out there who think reddit's (the company's)...

        Yes but $46 to $59 = good job as CEO

        Even before IPO company revenue go up = good job as CEO


        Of course I actually agree with you, but there are people out there who think reddit's (the company's) "number go up" moment is more important than the massive damage it has done to reddit (the community).

        To them, the stock price is more important than the mental health and well-being of the millions of people who inflict on themselves every day what reddit became.

        I look forward to the future looking back on this period of history as the developed world self-inflicting a new type of mass psychosis upon itself.

        8 votes
      2. [2]
        dhcrazy333
        Link Parent
        Spez has been a great CEO in that he's making strides to bring the site to profitability. The site itself has gone severely downhill, but that's not really relevant. The purpose of the CEO isn't...

        Spez has been a great CEO in that he's making strides to bring the site to profitability. The site itself has gone severely downhill, but that's not really relevant. The purpose of the CEO isn't to make the end user happy, it's to make the shareholders happy and to lead the company towards profitability. He's doing that with these deals.

        IMO reddit itself has gone to shit outside of some maybe smaller niche subs, you really need to tailor your subreddits to make it somewhat palatable. The site sucks now for us, but we aren't Spez's target audience. He wants those who will put up with ads and a shittier experience so long as they can get their rage bait and memes. So in that regard he's been a great CEO.

        1 vote
        1. infpossibilityspace
          Link Parent
          Reddit has an inherent problem which is that it wants to have it's cake and eat it - The value from reddit come purely from its users, who already receive nothing for their contributions, so...

          Reddit has an inherent problem which is that it wants to have it's cake and eat it - The value from reddit come purely from its users, who already receive nothing for their contributions, so trying to squeeze them more might feel like an injustice to many users. And if you drive away the users who actually provide value, what's the point of reddit? If I wanted rage-bait and memes I'd go to 4chan.

  6. ChingShih
    Link
    I only just noticed the timing, but the anti-free scraping policy was announced at the same time that New Reddit was sunset. What was formerly called sh.reddit is now the New New Reddit and users...

    I only just noticed the timing, but the anti-free scraping policy was announced at the same time that New Reddit was sunset. What was formerly called sh.reddit is now the New New Reddit and users cannot and will not be able to go back to New Reddit as of August 1st (the day it was announced). Moderators will have access to New Reddit until all the mod tools are moved over and they've given their assurance that Old Reddit will continue to exist (but the latest mod features aren't being backported).

    In case you missed it, we introduced a new web platform earlier this year, which is now available to all users. Historically, users have been able to force new.reddit.com on their browsers as a workaround to access the previous web platform, but we will be removing support for this routing going forward. From now on, URLs containing new.reddit.com will route you to those same pages on our new platform.

    This change will allow us to focus on developing new features and making improvements to reddit.com, rather than maintaining multiple versions of Reddit that are no longer being developed. Please note that you may still have access to a few pages on new.reddit.com, but expect them to migrate onto the new web platform soon. ...

    [Source: Reddit]

    Anyway, just thought it was interesting that internally these changes to web scraping seems linked to changes brought with the new UI (but not necessarily because of it).

    3 votes
  7. tanglisha
    Link
    Of course he does. This was the entire reason for that API fiasco. It's what the site used as a promise for more revenue when they went public.

    Of course he does. This was the entire reason for that API fiasco.

    It's what the site used as a promise for more revenue when they went public.

    2 votes
  8. [3]
    Dangerous_Dan_McGrew
    Link
    It seems lately reddit just cant go to shit fast enough so now they are actively ruining it.

    It seems lately reddit just cant go to shit fast enough so now they are actively ruining it.

    26 votes
    1. [2]
      maximum_bake
      Link Parent
      Can you elaborate on that? While I agree with the thought in general, I’m not sure how what Huffman is doing here specifically contributes to Reddit becoming shittier.

      Can you elaborate on that? While I agree with the thought in general, I’m not sure how what Huffman is doing here specifically contributes to Reddit becoming shittier.

      5 votes
      1. Dangerous_Dan_McGrew
        Link Parent
        He essentially keeps selling off bits of the platform to third parties and it reminds me all too much of the whole digg exodus. It was meant to be a cheeky comment.

        He essentially keeps selling off bits of the platform to third parties and it reminds me all too much of the whole digg exodus.

        It was meant to be a cheeky comment.

        4 votes