Thoughts on the current state of discoverability and search
I guess I'll post another thoughtful analysis rant on tech trends. It has been mentioned here in a few threads already but I simply wanted to try to start a focused discussion.
Personally I first noticed significant degradation of search functionality around 2018 or so, while specifically Google was mentioned at least as far back as 2016. But it is not simply Google or even just general search engines. Any random site specific search functionality or discoverability algorithms on various sites share these trends too.
It really seems that the focus is simply on delivering as many results as possible with actual quality or even relevance being somewhere on the tail end of priorities. It is not even just lack of(useful, consistent) search operators, lack of transparency, lack of structured search possibilities, lack of sorting options, lack of granularity - it is the simple disregard for the basic intent of the query with some implementations sometimes being actually more accurate with fewer keywords with no option to modify this behavior.
It is especially damaging for(at least my) ability to research a topic. A decade and half ago I could go in with a topic I had no idea about and emerge two hours later with a very basic but likely mostly accurate and slightly in-depth overview by refining my searches. Now I'm lucky to get one single thoughtful blog post or discussion among dozens or tutorials, 10-bests and ads with the query being almost completely disregarded and keywords being straight up ignored to deliver this deluge of both low quality and mostly completely irrelevant results.
Are there any projects, search engines or anything other that aim to deliver actually useful, steerable, user directed results?
Unfortunately I think there are two issues at play here: The quality of the search engines and the quality of the results themselves.
To use an analogy, I feel as though the internet started as a library and is turning in to a shopping mall. A search for a particular topic fifteen years ago would more return blog articles, journals, and topic-specific websites, in part because they made up a larger fraction of the content available. Now a search query is much more likely to return results aimed at capitalising on your curiosity: e.g click-bait, sponsored articles, etc.
Sadly, this trend is in full swing and the ability to generate AI content will only make the problem worse. I'm conscious this is already a very pessimistic take, but I fear that the hobbyist internet is giving way to something much less beneficial to us all, and there just isn't enough of an incentive for search engines to prioritise the quality content that remains.
We're in a sort of AI arms race between the garbage content mills and the search engines. AI slop is definitely already a huge problem and only getting worse. But I do think generative AI tech also has great potential to help people find signal within the noise.
For search engines, on the user-facing side, LLMs are way better at sussing out the intent of a query than dumb keyword-matching and whatever other synonym-finding heuristics are tried. OP mentioned that it seems like intent is sometimes straight-up ignored, but a good model ought to be able to get to the meat of what you're looking for even if you're not sure how to phrase it.
Part of the reason things have deteriorated so badly is the culmination of years of escalating SEO tactics that aggressively undermine the efficacy of search algorithms. Maybe I'm naive but I hope that on the crawling side, smart AI will be able to read and catalogue page content in a more humanlike way than previously possible. It ought to be able to spot spam and bullshit in the same way a person would, and downrank those URLs accordingly. Right now we have a system where people can sling the lowest-effort nonsense and get top billing in the search results, but I'm imagining a future where the only way to secure those top positions is to provide actual value to real people.
Google recently announced they are migrating their flagship search over to something AI-focused. I don't know the details but if it's a step toward what I'm describing it could herald the kind of spring cleaning the web is long overdue for.
I would really like someone to develop a Search Engine De-optimization tool. It seems like it would be doable with some kind of AI filter, since most SEO content is often low quality, irrelevant or trying to sell you something. And there is plenty of data out there to train on.
I know this is similar to what Google is proposing, but they were the ones that incentivized SEO in the first place and got us into this mess. There’s also been some more recent accusations that Google has switched from being focused on search quality, to focusing on user interaction and profit generation, which happened right around the time many people started noticing a decrease in search quality. Soooo, not sure how much I trust Googles implementation.
I want something that specifically goes to bat against SEO. Bonus points for easily being able to filter out entire websites (looking at you Pinterest). Overall, I imagine it working much like an ad-blocker but for search results.
I would pay money for this.
In regards to the quality of results, it's worth pointing out that a lot of content that would have appeared on small sites has migrated into walled gardens like TikTok, Discord, Facebook, etc. Things feel more fractured and when I'm searching for something now, it's become a matter of having to fan my efforts out across multiple networks.
That said, there's the issue of identifying the right groups or creators, but when I do the answers can be much higher quality than what blogspam or AI.
I commiserate. I've recently noticed two changes that have severely impeded the way I work in ux / technical writing:
Google seems to have gotten ride of the approx. number of search hits. I'm quietly seething over this one. Inexact as it may be, this has long been an handy way to compare between word/phrase variants and figure out the more common, when deciding on plain language or attempting to standardize language. Especially helpful for work when paired with site-specific search.
May be a bug, but Google doesn't always seem to respect terms in "quotes" quite so strictly as it used to. I even experienced this in Google Scholar recently, where exact, jargony terms are a standard. Google search really seems to hate highly technical terms; with the amount of times it's decided I've gotten my search terms wrong and has 'fixed' them for me, quotes have been a lifesaver.
For 1, I use google ngrams which is nice because you can see changes over time, look in how british and american english differ, and do stuff like fuzzy finding (e.g. "I am greatly *" lists i am greatly indebted, I am greatly mistaken,... and their various popularities), also can do stuff like calculating ratios or differences with math operators
For 2, I definitely also find this to be true. The same happens on youtube, which will often give you maybe ten actual results and then start giving generic recommendations.
Thanks for the tip! Ive used ngrams a bit in the past, but it's not always useful for my purposes.
That said, I should have looked up an answer before complaining, because it is still possible to access the results numbers by selecting "Tools" after each search, at least on desktop; credit where due, it seems like a good balance between what the average user needs vs. a more niche user.
Personally I’ve switched to paying for Kagi for searches that are not straightforward.
I’ve also been more and more disappointed with Google (and others), and I find Kagi to be worth its price. I know most people won’t consider paying for search of course, but for those who do I advise trying it.
I got the smallest plan (5$/month) which gives me a limited amount of searches (unlike the higher one). I make it work without much pain by configuring a free search engine as default, and prefixing the search by “k “ to use Kagi. This way it’s always on hand but I don’t use up searches for trivial ones.
Kagi is far from perfect but it’s better than what Google has become. The various knobs it gives you for tweaking your results and adjusting the site’s display/behavior to your liking are helpful.
I pay for Kagi, but I often find degraded results after the top five or ten returned.
In the earlier Google days, I'd usually get pages of worthwhile results on a well-tuned set of search terms. And by worthwhile, I mean not just the exact information I was searching for, but closely related material that would expand my understanding of the topic. Those results were akin to wandering in the stacks of a graduate library and pulling volumes with Library of Congress IDs a digit or so on either side of the specific subject you were seeking.
I don't know if LLM-guided search will produce anything similar, but I'd gladly pay for that.
A lot of stuff is locked inside Discord now, which is being used in inappropriate ways, and Discord search is really, really bad. That information is just not there to be returned. It hasn't quite been mentioned yet, but I find that search currently requires manually checking the handful of specific websites or following manual procedures that might lead you to the information you're looking in certain previously known locations.
Currently, LLMs can help digest the information and match it to what you're looking for, but also introduce the problem you wouldn't otherwise have of hallucinations and weird misinterpretations. Recently I experienced this kind of LLM failure (documented on reddit) when performing a fairly straightforward search for the board game a certain card belonged to, with various LLMs ignoring the relevant sources, answering a question different than the one asked, or outright lying. Only Google Gemini got it right.
I have nothing to add except more ranting. I feel your pain. I used to be able to search medical articles or research papers but now I get Mayo Clinic, Cleveland Clinic or WebMD as top results. They are not exactly bad, but also not what I'm necessarily looking for.
The other thing I've noticed lately is youtube vids rather than well written articles for DIY stuff. This is usually not super helpful and I totally agree it usually ignores part of my search terms. I have a decent understanding of a good boolean search and yet more recent results do not seem as accurate or helpful.
Have you tried searching Medline or PubMed? If you specifically want medical journals I would look there. I would also try Google scholar
Not a bad idea. Those used to come up in search along with a few other trusted sources. I like to read a number of articles or papers on a subject to get a better understanding. I may have to go directly to the sites I know and search, which may lead to some decent results, but would leave out undiscovered sites.
I think current search is starting to really dumb down results along with the obvious shoving paid placement being featured. Frustrating to say the least.
Academic paper search in Kagi is quite good, and one of the reasons I find that search engine worth paying for.
I might have to try this, as others also find it valuable. As much as we all love free, it's starting to come with too many strings.
At work (software dev), I've been test driving a LLM solution. Although it's terrible at times, it is currently often much better that what search engines are returning. It makes me sad, because search engines were great before and arguably returned better results than the LLM is doing now.
Check out https://search.marginalia.nu/ which is a search engine that does probably exactly what you want.
A short while back I was reading an internet argument with one person who was saying that some crazy social phenomenon was totally common and as proof he said that one could simply google it and see the stories of people who documented it.
And that is when I realized that if you search for just about anything you will find it, regardless of weather or not it is real. That is what the infinite content pipeline has given us, and with AI it’s able to keep up with any demand instantly.
Of course that is not entirely true, because when society has overwhelmingly decided something it creates ideological “black holes” that will suck away any ability to search for related info or counter-claims. For instance I was trying to search for what effects consuming cholesterol has on the human body, and it just poured a deluge of articles saying that it doesn’t affect your blood cholesterol. That’s nice to know but it wasn’t enough to sate my curiosity.
Internet search has probably been ruined, and there is probably no going back. I should probably check out DMOZ and see how it holds up today.
For the cholesterol question, I would search for research articles in the type of databases university libraries subscribe to. Pubmed, Medline and Google Scholar will duplicate a lot of those articles but if you have a local public university you could ask a librarian for help.
I actually got so fed up with search engines (I pay for Kagi but it isn't perfect), I ask my custom GPT to do the search for me sometimes. I specify it must source links I desire, with a target number of options and a brief description of each, as appropriate. Surpringly useful. I assume it'll be even moreso when I recreate it with a GPT-4o backbone.