26 votes

A look at search engines with their own indexes

8 comments

  1. [2]
    Atvelonis
    Link
    Oh, how neat! I had no idea Bing's results were used by so many other search engines. Thank you for posting.

    Oh, how neat! I had no idea Bing's results were used by so many other search engines. Thank you for posting.

    4 votes
    1. Seirdy
      Link Parent
      That's just the tip of the iceberg; there are tons more that I didn't include because the list of Bing-based engines was just way too long.

      That's just the tip of the iceberg; there are tons more that I didn't include because the list of Bing-based engines was just way too long.

      4 votes
  2. Seirdy
    Link
    Feedback and additions are welcome. Currently contains 30 indexing search engines.

    Feedback and additions are welcome. Currently contains 30 indexing search engines.

    3 votes
  3. [2]
    pseudolobster
    Link
    Great writeup! Seems very comprehensive for the english-language web. I think it'd be interesting to cross-reference your findings with an aggregate of webserver logs. I know anecdotally from...

    Great writeup! Seems very comprehensive for the english-language web.

    I think it'd be interesting to cross-reference your findings with an aggregate of webserver logs. I know anecdotally from experience that Baidu's index is pretty damn comprehensive even though their english-language results seem poor. There's also a couple dozen other spiders that routinely show up in server logs. The ones that obey robots.txt and provide info in their user-agent, anyway. Most of them seem to be scraping the web for their own internal search engine or training data for ML etc, but some of them claim to be legitimate attempts to create an independent search engine.

    As an aside, I wonder if you could call the archive.org wayback machine a search engine. It's certainly got a very large index.

    3 votes
    1. Seirdy
      Link Parent
      I discovered Petal, Gowiki, Crawlson, Yisou, Seznam, and Apple Search through my own access logs. I'm sure more have hit my site, but only keep logs for three to five days (and only for certain...

      I think it'd be interesting to cross-reference your findings with an aggregate of webserver logs.

      I discovered Petal, Gowiki, Crawlson, Yisou, Seznam, and Apple Search through my own access logs. I'm sure more have hit my site, but only keep logs for three to five days (and only for certain HTTP responses) and I don't plan on changing that.

      Most spiders in my experience aren't for search engines, let alone publicly-available general search engines; they're for SEO services, adtech, or benign content scrapers that grab content for link previews or bookmarking services.

      1 vote
  4. Moonchild
    Link
    Findx used to have its own index; sadly, no longer.

    Findx used to have its own index; sadly, no longer.

    1 vote
  5. [2]
    KapteinB
    Link
    Petal is intriguing. Huawei of course needs some other search engine than Google, and my first guess was they found the Chinese search engines to be not international enough. They would have...

    Petal is intriguing. Huawei of course needs some other search engine than Google, and my first guess was they found the Chinese search engines to be not international enough. They would have probably saved themselves a lot of work and money still by partnering with for example Baidu and commissioning them to create a more internationally focused search engine.

    However, after watching a couple YouTube videos about it, it looks like the primary use case for Petal is to find Android apps that aren't in Hauwei's App Gallery, with the web search more of a secondary feature.

    Visiting the website on desktop show's it's not only designed to be mobile first; it's plain awful on desktop. It's clearly meant for the mobile market.

    1 vote
    1. Seirdy
      Link Parent
      It was originally designed for finding Android apps; however, it expanded to general search a few months ago. It continues to be mobile-first. I wouldn't directly use it for anything non-trivial...

      It was originally designed for finding Android apps; however, it expanded to general search a few months ago. It continues to be mobile-first.

      I wouldn't directly use it for anything non-trivial given its obvious privacy issues and piles of JS, but I hope it can be incorporated into other privacy-respecting metasearch/proxy engines.

      1 vote