71
votes
‘Reddit can survive without search’: company reportedly threatens to block Google
Link information
This data is scraped automatically and may be incorrect.
- Title
- "Nothing is changing": Reddit flatly denies report that it might wall off its content
- Authors
- Jay Peters
- Published
- Oct 20 2023
- Word count
- 358 words
All blocking webcrawlers would do is hide reddit results from search engines. What makes them think people scraping data to train AI models without permission will give a crap about robots.txt?
In the EU, or for anything released to the European market: The huge, huge fines for breaking the laws that require you to follow robot exclusions like robots.txt excepting the defined exceptions.
But yes, AI data training sets that are thousands and thousands of literally torrented books and stuff: You're right in practice. That'll probably change when the lawsuits hit. Go fast and break things leaves broken things you eventually have to pay for if you've been a bull in a china shop.
And even then, who knows if we might not get AI-based services that just don't release in areas with exclusionary protections?
I wasn't aware that robots.txt is backed up by law in any part of the world. Would you mind elaborating on which EU law does this?
According to Wikipedia, there is no such law, and robots.txt relies solely on the compliance of the robot
In fact, some robots ignore robots.txt even without malicious intent, eg. ArchiveTeam
https://wiki.archiveteam.org/index.php/Robots.txt
What do you mean by malice? The archiveteam ignores robots.txt, scraping websites in ways their owners have clearly expressed that they do not want. I am not casting moral judgment, but I don't see what makes archiveteam different from anybody else.
Thats not Wikipedia, thats their own wiki. Not that i think it makes a big difference to the accuracy of the contents in this case.
Yes, I know. I was specifically referring to Wikipedia in my first sentence.
https://en.m.wikipedia.org/wiki/Robots.txt#Security
Probably likely that AI-based services start circumventing overly regulated jurisdictions. Look at Google's and Meta's response to a law forcing them to pay when local news links were shared and shown in Canada.
The ultimatum was "pay up or shut down", and Google and Meta decided to shut down services. And then the politicians that voted for this got mad.
The question EU has to ask is how long one of their biggest technological export be regulations before tech decides to not serve EU.
And I don't even know how open source models and weights get treated in regulated jurisdictions. Can you just not download the weights and run the model locally?
Canadian legislators thought they could pull off what the EU can.
I'd rather my representatives have a spine and make moves, than offer their tits to the States and their mega conglomerates that continue to rack and salt our country dry. Your feedback is noise.
I'm not saying they shouldn't try. I'm saying the "Do this or else we'll block you" works the opposite way when the country doesn't have enough power. For the EU when they told Apple "Use the industry standard port or else ..." it means Apple's going to use USB-C. When Canada tells FAANG "Comply with our news laws or else ..." it just means they're going to ban different FAANG services. That's what the legislators need to understand when they write those laws.
I don't know how true the story of "And then the politicians that voted for this got mad" is. Maybe it's an exaggeration. If it is true then it seems they don't understand what their ultimatum actually does, strategically.
This over simplifies the problem. Sure it is possible that if the country doesn't have as much market power (consumers buying things) then their legislation may not have the influence to achieve outcomes, but it's equally important that what they're pushing companies to do is something reasonable or rational for them.
What EU told Apple to do is use a port that Apple was already using on many of their other devices, and a port that a lot of hardware is already using. It's a relatively minor cost to Apple compared to what Canada was asking companies to do to comply with their news law. IMO Canada's news law is just designed poorly and is misguided and that's the reason why it lacked impact.
If EU had demanded that Apple put a USB-A port on their iPhone, Apple probably would have told them to fuck off. It's not that the EU is all-powerful, it's that their demand was relatively reasonable unlike what Canada was seeking.
IMO, "getting mad" here is performative and part of negotiation. Politicians who support the bill want users to side with them and contribute to pressuring the companies into making a deal.
LMAO
As far as I know Google respects robots.txt. Do you know differently?
You've misread. They're saying Google does respect robots.txt but AI companies are unlikely to.
Google is an AI company, so there is one AI company that does.
Google's search engine respects robots.txt.
Bard probably doesn't.
Wasn't one of the key "issues" during this past summer's mod blackout that Google search results wouldn't open for people because the subs were closed... now Reddit doesn't want anything showing up on Google?
Nosedive the plane harder folks.
If anything, it highlighted how useless Google was and how concentrated information is onto certain sites. This likely emboldens Reddit to take a move like this, because if your searches are worthless without adding "reddit" to the end, then it pushes people to come to Reddit to search. So now instead of using Google to search, you come to reddit and use their built-in search (as maligned as it has been over the years, my understanding is supposedly it has improved some and in this case you'd be comparing it to search engines that can't search reddit anymore).
This might also be why the reports included the aspect of having to "log in" to use reddit, because I've found a lot of sites require you to log in to use search features. Surprisingly it does not seem as though reddit does now, but I wouldn't be surprised if they've considered it as part of this move if they go through with it.
While reddit doesn't flatly block people without accounts, it can be pretty dang obstructive depending on the subreddit and device. When browsing on the mobile site while not logged in (mainly when searching stuff), I pretty regularly encounter a pop-up saying to open in the app due to a subreddit or post being marked 18+. Also find some subreddits just... Won't work properly. It doesn't block me but I can't scroll down, and have to switch to desktop view. I've also previously encountered the "download app" prompt for small, niche subreddits, but haven't seen it in a while.
So basically, they've already made it pretty dang tedious to use the mobile site without an account.
The blackout definitely showed off Google's weaknesses more than Reddit's though. Made me realize how useless the top results of Google tend to be.
While it still around, I suggest using old.reddit to circumvent most of these measures.
I honestly use Bing more - I switched from chrome to edge (yes I know it's chromium under the hood) and never bothered to change the search engine. For most of my needs it's been fantastic a fantastic google search replacement (especially if you start to master using bing chat / chatgpt)
Google maps however is still king of it's castle, and youtube is as much a search engine as it is a video platform in my opinion.
Honorable mention to https://yandex.com/ as another search engine I've had good experience with.
I primarily use DuckDuckGo myself, with Google as a backup since they bring up different sites. May need to try to give Bing a shot too, though the earlier years of it definitely left a bad impression on me.
I find DDG is fantastic for text based searches. I.e. I'm looking for a website that talks about... Whatever.
But the image search is kinda meh. So I chuck the OLD "!G" in there for it to throw the results at Google.
Do you know about !gi for google images?
Idk what either !g or !gi do. What are they? (Googling it just brings up info about gastro intestinal stuff).
Duckduckgo uses modifiers called bangs (which are an exclamation mark followed by some combination of letters) to change its search results.
So if you want to search google instead of using duckduckgo, you would include !g in your search string. !Gi searches google images
It's not filtering DDG's own results, it takes you directly to google's search page/s (or whatever the site). It also strips out some of the tracking you wold incur if you searched directly on the site like your originating IP address but it's marginal at best since once you're on the site you're bound by their own data collection.
There's even a !bangs search
While I don’t use Reddit anymore, what used to work pretty well for me was a custom redirect (via a browser extension that can do this) from
reddit.com/r/*
toold.reddit.com/r/*
(you can’t do it sitewide because it’ll break some images displayed with their internal hosting, but all actual posts and feeds will be displayed under an/r/*
).As an effect, you’re dealing with old.reddit.com layout-ing on mobile, but to be frank, that was the lesser of two evils for me back then, and I can imagine it’s only gotten worse since then.
I did the same thing except with the Internet Archive's URL at the start in order to deny the fuckers any traffic. Bonus: if the page hasn't been crawled, you can add it to the archive for when Reddit inevitably shuts down Old, adds a paywall, or dies.
This only works if reddit actually improves their own search because god damn you can’t find shit with it
💯
Reddit has been telling people since the inception of search that it's improving and fixed its problems and it's now "outstanding™". It's still incredibly poor in comparison, arguably worse since adopting sponsored and "relevant" results.
Last time I checked it couldn't find ~70% subtitle of the exact post title in a specific sub, whereas it was the first result in google & duckduckgo.
Not necessarily, as I ended with my last remark, you would be comparing it to search engines that would no longer be able to search reddit anymore. So if Google or Bing or whatever can't surface an answer for you, because the answer is on reddit, then you go search reddit and deal with the shit search. If it can get you an answer even 50% of the time, that's still better than 0% of the time that Google or other search engines will get you.
I disagree. I have long since given up on Reddit’s own search, which is why I use Google to do it. If they block that, then I am truly completely done with Reddit.
Reddit continues to make the same major mistakes like it predecessors like Digg did before it. The mistake they are making this time is:
"Reddit is not the product"
People don't come to Reddit to see Reddit, they come to see articles/posts and interactions from real people. In Reddit's quest to deny AI free training data they are going to kill the discoverability of their platform. Slow death, quick death we'll see what happens
Not being able to find Reddit posts in web search results will make Reddit far less relevant.
I keep reading ex-redditors writing things like "I'm DONE socializing there, but it is a great place to find information about some things".
Nobody is going to log into Reddit and use their horrible native search engine to do research.
Spez seems determined to keep making stupid decisions.
It makes me wonder if he was at a party with other social media founders, found out how rich they are compared to him, got really angry, and started making spite motivated decisions.
A lot of users have deleted their old comments since the last big controversy, so even without Google being able to index them, a lot of obscure information that could only be found there has disappeared.
Thus endeth the ball game.*
*quote from Peanuts, by Charles Schulz, scraped by Don Quixote's still human brain.
Thus endeth search as we have come to know it. Ironic that the term webcrawler, invented by Stan Lee was itself scraped from Marvel Comics in the early days of the World Wide Web.
Anyone starting to get more uncomfortable with the whole chatgpt ai training off data left right and center ?
Do it, please do it, give me one less reason ever need to visit that site.
Yeah. I miss old reddit so much, and still able to find good answers to various questions on it, but.. if company/CEO thinks that they so valuable that they can fuckup userbase as much as they want, then its time to kill it.
Ah... The spaz saga continues...
I’m guessing you meant spez? Maybe it autocorrects. Please don’t use spaz as a slur tho, either way
Not the person you replied to, but is 'spaz' considered ableist or something? I'd never heard of it being offensive before.
Maybe it’s an Australian thing, but I often heard “spaz” or “spazzo” as an insult at school in the 90s/00s as a shortening of “spastic” which is I believe the medical term for the way muscles develop and respond differently in people who have cerebral palsy.
Hey thanks for reminding me -- I haven't looked at Reddit since August....
/# wait 120
Oof. Remind me again next year sometime.
Joke's on us, the worse reddit search is the more useless pages you have to click on before finding what you want == you see more ads in the process
Well that's because the users stopped being the customers. It's the advertisers that are (and have been for a while now) the customers. They pay Reddit's (and most other online platforms') bills, not the users! So the companies are still merely asking "how can we make this better for the customer?", and that normally means making things worse for the users.
It's never been that bad for me, but reddit's search does struggle at understanding deeper context within reddit articles.
Reddit search has always been pretty abysmal for me. Every time I've tried it, I've had to dig for the post I'm looking for where "<term> site:reddit.com" on Google or Kagi will turn it up in the first 1-5 results.
Sorry to derail the topic but this is the first time I'm hearing about Kagi. It looks like a pretty interesting alternative, how was your experience with it?
edit: Looks like they have a free trial too.
Not who you asked, but I've been a (happy) paying customer since April 2022.
I only come back to Google to find images, Kagi still isn't very good at it.
Definitely sign up for the trial. After getting through it I signed up for a paid account immediately. The easy method of blocking sites or lowering or raising their priority in results is very helpful, I haven’t even gotten into the other features yet.
It's been solid, generally as good as Google or better in most cases.
My only gripe isn't with the engine itself, but with how Safari only lets you choose one of the predefined search engines for its search and so the Safari Kagi extension has to work by redirecting URL bar search requests from Google to Kagi, but that has more to do with my own stubbornness. Kagi's maker also makes a browser (Orion) that I should probably switch to as well since it uses the same engine as Safari (important for resource efficiency — Chrome and Firefox are battery hungry) but is more configurable.
I never picked up Bitcoin or porn spammers by accident. I could always find topics with titles that I'm looking for. But comments, or any wider context? No.
So, yes, not as bad as the parent put it. Not by a long shot.