23 votes

Google cut a deal with Reddit for AI training data

5 comments

  1. [4]
    skybrian
    Link
    So, not just AI. Google is of course crawling Reddit already, but perhaps search could be improved too? Some background: years ago, Google and Twitter made deals giving Google access to Twitter's...

    The collaboration will give Google access to Reddit’s data API, which delivers real-time content from Reddit’s platform. This will provide “Google with an efficient and structured way to access the vast corpus of existing content on Reddit,” while also allowing the company to display content from Reddit in new ways across its products.

    So, not just AI. Google is of course crawling Reddit already, but perhaps search could be improved too?

    Some background: years ago, Google and Twitter made deals giving Google access to Twitter's "firehose" but that was an off-and-on thing.

    I guess it's good that they worked out their differences, for now.

    9 votes
    1. [3]
      Wes
      Link Parent
      It actually makes a lot of sense. People are often lamenting that they need to append "reddit" to search queries, and I do the same. If they have access to the Reddit firehose, they have a real...

      It actually makes a lot of sense. People are often lamenting that they need to append "reddit" to search queries, and I do the same. If they have access to the Reddit firehose, they have a real opportunity to improve their regular results.

      6 votes
      1. [2]
        skybrian
        Link Parent
        I think that's different; when you add "reddit" you're just changing the ranking. It shows that they already have lots of Reddit comments indexed. Google isn't necessarily going to rank Reddit...

        I think that's different; when you add "reddit" you're just changing the ranking. It shows that they already have lots of Reddit comments indexed.

        Google isn't necessarily going to rank Reddit comments any higher when you're not searching for them. They might have new comments sooner, though, or keep a larger history.

        8 votes
        1. Wes
          Link Parent
          That's more my thinking. That they'll have access to more recent comments, and have a better understanding of the overall structure and relationship of those comments. For example reddit has an...

          They might have new comments sooner, though, or keep a larger history.

          That's more my thinking. That they'll have access to more recent comments, and have a better understanding of the overall structure and relationship of those comments. For example reddit has an "other discussions" tab at the top that Google could more easily use to find related content. They could have written special handling for something like that before, but I get the impression that Google doesn't like to build special handling for websites (preferring to rely on OpenGraph, Schema.org and such). So an API for exploring relationships makes that more feasible.

          It may also play into E-E-A-T where a community or even individual could be assigned an authority grading to determine if they're a good source to show. There's a lot of bologna on Reddit, but also a lot of insight, and having a wide span of data gives Google a better chance at determining which is which.

          One other consideration is that Reddit isn't very good about showing old content. Their codebase tends to drop content after 1,000 entries in all views/filters. Having direct API access likely gives Google more access to historical data that would otherwise be difficult to scrape, even if its recency may downrank it.

          I'm sure AI does play a large role in this deal, but I can definitely see benefits on the Search side as well.

          7 votes