52 votes

New search capabilities available: phrases, excluding terms, alternatives ("or")

On Sunday, I took the site down for a short downtime to upgrade the database from PostgreSQL version 10 to 12. One of the main reasons I wanted to do that upgrade was to get access to a new search function, and I've updated to using it now, so we have multiple nice new search capabilities available.

These should all be pretty familiar since a lot of other search systems and search engines have similar capabilities with the same syntax:

  • As before, by default, searching for multiple words will be treated as "all of these terms". So if you search ~games for steam play, you'll get all topics that have both "steam" and "play" in them.
  • Phrases can now be searched for by putting double quotes around them. Searching ~games for "steam play" in quotes will only find topics that specifically have "steam play".
  • Excluding terms can be done by putting a minus sign in front of it. For example, if you wanted to try to find ~games posts about Blizzard and exclude the recent China controversy, you could search for blizzard -china.
  • Alternatives can be searched for by using "or". This changes to "any of these terms" instead of "all of these terms". For example, searching for overwatch or diablo will find any topic with either of those terms, instead of both.
  • These capabilities can be combined, so you can exclude phrases, use "or" with phrases, and so on. For example: blizzard -"hong kong" or diablo.

This all works both through the main site topic search (at the top of the sidebar) as well as the new search for your own topics/comments.

I'm going to write a page for the Docs with info about these capabilities, but I think I want to try to find a full specification of what's supported first to make sure I cover it properly. The PostgreSQL docs are pretty vague about it, so I'll probably need to take a look in the actual code.

Please let me know if you notice any issues with it, or if anything's confusing that I should make sure to document.

And as usual, I've given everyone 10 invites, accessible on the invite page.

14 comments

  1. [5]
    Amarok
    Link
    Now this place has better search than reddit ever will. :P

    Now this place has better search than reddit ever will. :P

    14 votes
    1. [4]
      Deimos
      Link Parent
      Almost! (And being able to search your own comments is already better.) I think we actually might have superior search overall once we have the ability to search against specific "fields" by doing...

      Almost! (And being able to search your own comments is already better.)

      I think we actually might have superior search overall once we have the ability to search against specific "fields" by doing things like "domain:youtube.com". That's probably the main missing capability now.

      13 votes
      1. Amarok
        Link Parent
        Reddit can't even get past 1k results or find anything older than a year or two at this point. :/ I wonder if there's any value in creating a tag-only search mode. Right now it's kinda pointless....

        Reddit can't even get past 1k results or find anything older than a year or two at this point. :/

        I wonder if there's any value in creating a tag-only search mode. Right now it's kinda pointless. If Tildes ever ends up with a pile of data like reddit's history, though, it could come in very handy. A tag-only mechanism could work wonders from a much smaller dataset than comment/title searches require. It could take the load off of doing deep searches that cover a lot of groups or search over longer time periods.

        7 votes
      2. [2]
        emdash
        Link Parent
        Is tag-based search implemented right now? StackOverflow solves this with [tagname] syntax.

        Is tag-based search implemented right now? StackOverflow solves this with [tagname] syntax.

        3 votes
        1. Deimos
          Link Parent
          Tags are searchable, but they're basically just treated as another word in the topic. There's no way to distinguish yet (through search) for looking exclusively for tags. For example, if you...

          Tags are searchable, but they're basically just treated as another word in the topic. There's no way to distinguish yet (through search) for looking exclusively for tags. For example, if you search for "social media" you'll find any topic that has the tag "social media", or has "social media" in its title or text.

          You can use the specific tag lookup by either clicking the tag on an existing topic or going to an address like https://tildes.net/~tech?tag=social_media though.

          6 votes
  2. [3]
    teaearlgraycold
    Link
    What Postgres features specifically did you use for this?

    What Postgres features specifically did you use for this?

    5 votes
    1. [2]
      Deimos
      Link Parent
      I'm using its general full-text search, but these specific features were easy to add because of the websearch_to_tsquery() function that was added in PostgreSQL 11. I can't link directly to it,...

      I'm using its general full-text search, but these specific features were easy to add because of the websearch_to_tsquery() function that was added in PostgreSQL 11.

      I can't link directly to it, but the main description of it is at the bottom of this section of the docs: https://www.postgresql.org/docs/12/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

      Here's the commit that added it, I've just started looking through this to try to get a full understanding of what it supports: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1664ae1978bf0f5ee940dc2fc8313e6400a7e7da

      9 votes
      1. teaearlgraycold
        Link Parent
        Wow, that's basically everything you'll ever need for free.

        websearch_to_tsquery creates a tsquery value from querytext using an alternative syntax in which simple unformatted text is a valid query. Unlike plainto_tsquery and phraseto_tsquery, it also recognizes certain operators. Moreover, this function should never raise syntax errors, which makes it possible to use raw user-supplied input for search. The following syntax is supported:

        • unquoted text: text not inside quote marks will be converted to terms separated by & operators, as if processed by plainto_tsquery.

        • "quoted text": text inside quote marks will be converted to terms separated by <-> operators, as if processed by phraseto_tsquery.

        • OR: logical or will be converted to the | operator.

        • -: the logical not operator, converted to the the ! operator.

        Wow, that's basically everything you'll ever need for free.

        10 votes
  3. [6]
    aphoenix
    Link
    Great stuff, and now I feel like I should do the 10-to-12 for like 20 projects...

    Great stuff, and now I feel like I should do the 10-to-12 for like 20 projects...

    3 votes
    1. [5]
      Deimos
      (edited )
      Link Parent
      It was easy using the pg_upgrade tool that's included with PostgreSQL. I wrote the steps that were needed in the commit message here:...

      It was easy using the pg_upgrade tool that's included with PostgreSQL. I wrote the steps that were needed in the commit message here: https://gitlab.com/tildes/tildes/commit/ca509b220b7db1efc5929f91fc4ded3e52c5a9dd

      It'll be a little different if you're not on Ubuntu, and some of it (like the part related to salt/pillar) probably won't be applicable elsewhere, but it was quite straightforward.

      7 votes
      1. [3]
        Bauke
        Link Parent
        Will people have to do this in the dev environment too or is that automatically taken care of?

        Will people have to do this in the dev environment too or is that automatically taken care of?

        2 votes
        1. [2]
          Deimos
          Link Parent
          They'd have to do that if they want to keep an existing vagrant box and upgrade it from 10 to 12. It's probably simpler to just destroy it and create a new one that will start on 12.

          They'd have to do that if they want to keep an existing vagrant box and upgrade it from 10 to 12. It's probably simpler to just destroy it and create a new one that will start on 12.

          2 votes
          1. Bauke
            Link Parent
            Good to know, thanks!

            Good to know, thanks!

            2 votes
      2. aphoenix
        Link Parent
        Oh wow, that's awesome. Most of my projects are in docker containers, but some of them are on Ubuntu - this is actually super helpful. Thanks a ton!

        Oh wow, that's awesome.

        Most of my projects are in docker containers, but some of them are on Ubuntu - this is actually super helpful. Thanks a ton!

        2 votes