14 votes

What should be done about synonymous tags?

Tags: tags

It seems as though there are a lot of redundant tags. For instance, over on ~creative, there have been posts submitted with the tags of poetry, poems, and poem, many of them with multiple of these tags. This seems to subvert the purpose of the tagging system; if I want to look at, say, all poetry-related posts (or, on the other hand, to exclude them), I have to imagine all the possible ways to mention poetry. In most cases, there will be only a few (like in this case, 3), but it still is redundant and adds friction and seems kind of pointless. What, if anything, should be done about them? Can we make a user-based system to 'merge' tags, where we choose one of them to be canonical—for instance, 'poetry'—and when a user tries to add a tag that has been declared a synonym, it is replaced with its canonical counterpart?

29 comments

  1. [10]
    Deimos
    Link
    We should pick one, and I can convert the others to the chosen one. So for example if we pick "poetry", I can convert all the ones with "poems" and "poem" to have "poetry" instead. Since we have...

    We should pick one, and I can convert the others to the chosen one. So for example if we pick "poetry", I can convert all the ones with "poems" and "poem" to have "poetry" instead.

    Since we have autocomplete for tagging now, I think that will fix most cases, since in the future someone will start typing "poe" and probably choose "poetry" when it gets suggested instead of continuing to manually type in one of the others.

    There is also an extremely basic method of adding "tag synonyms" in the code right now that could be expanded if it's there's a common issue that comes up. Right now the only thing it does is convert "spoilers" to "spoiler".

    11 votes
    1. hungariantoast
      (edited )
      Link Parent
      It's been a long time coming that I've wanted to write a very detailed topic on just how much work curating tags can be. It's late though, so I'll spare you thousands of words in exchange for some...

      It's been a long time coming that I've wanted to write a very detailed topic on just how much work curating tags can be.

      It's late though, so I'll spare you thousands of words in exchange for some quick bullet point suggestions about tags:

      • Synonymize certain tags (like regional spellings) so searching one tag shows all of its alternatives. Searching defence returns results for defense. Searching "defence" returns only results for that string specifically.

      • Run automated tag correction scripts on a set schedule for very specific tagging conventions.

      • To take the automation a step further, release weekly tag reports that list off topics that might be missing tags, have misspellings, or suspicious tags. Introduce a gamified approach to getting users to chew away at these lists, fixing them collaboratively.

      • Create a very easy way to query specific tags so that moderators can find and correct them. (The current search tool probably won't cut it. Moderators probably need to be able to individually list topics with those tags and change them from the same page. Or mark (checkbox?) several topics to be changed at once.)

      • Create a way for users to "protest" tags, which would simply let moderators know that X number of users have protested that tag. This way, users who don't have tag privileges can still be of some use. This could also be a great way to gain trust.

      • Make tag editing one of the very first privileges users earn with the trust system. Large scale tagging conventions and standards require large scale, collaborative participation. If I didn't think I'd be crucified for suggesting it, I'd say give everyone the ability to do so once their seven days are up.

      • Allow moderators to create "watchlists" that monitor specific topics, users, or tags, and get alerted when any additions or changes happen to these subjects. (Okay, I get it, tracking users like this might not be the best thing, I just wanted to bring it up.) If we have a moderator who is an aviation engineer, it's not really a bad idea for them to be able to monitor and curate tags for topics that align to that subject. So they might pay attention to topics tagged aviation and routinely add additional tags and monitor those topics. (Come to think of it, this isn't entirely unlike subscribing to topics.)

      Some of these bullet points are less relevant if others are followed through with, and I'm sure there are caveats and additions to this list that I haven't thought of yet.

      My biggest concern though, is coming up with a way for any user to propose a tagging convention, rule, standard, whatever you want to call it. Right now any tagging conventions are going to be created and enforced at the sole discretion of you and appointed moderators, but as Tildes scales, that simply won't be sustainable. At the same time, letting 1,000 users run wild with those privileges creating ad hoc rules that compete with each other doesn't scale either. There needs to (eventually) be a formal pipeline for proposing, voting, or otherwise deciding on new rules for tags.

      Oh, and one last thing, there are four topics tagged programming language that should be programming languages.

      EDIT:

      After reading @Dovey's and @patience_limited's comments here, I want to point out that tagging rules and documentation should largely be general guidelines. I'm not suggesting all of these things because I want proper tagging to be a daunting activity for new users. I want these rules and documentation to be available to make it easier for people to tag things properly, whether they are moderators or not.

      If someone needs to reference the tagging documentation every single time they post a topic, just to make sure they're tagging stuff correctly, then we've failed. The rules shouldn't introduce complexity, they should eliminate ambiguity.

      12 votes
    2. archevel
      Link Parent
      The way Stackoverflow works with tags and tag synonyms could be helpful here. Back in 2010 they introduced a system allowing users to manage the tags. It's been tweaked since. Now, Tildes doesn't...

      The way Stackoverflow works with tags and tag synonyms could be helpful here. Back in 2010 they introduced a system allowing users to manage the tags. It's been tweaked since. Now, Tildes doesn't have the same reputation system, but for tags at least some inspiration could be taken from a site such as Stackoverflow that has been dealing with this type of issues much longer.

      This question/answer has an overview of how the system works:
      https://stackoverflow.blog/2010/08/01/tag-folksonomy-and-tag-synonyms/

      Qutoing the some pertinent parts:

      The system organizes tags in a master–synonym relationship. All attempts to use the synonym tag for any given master tag are automatically converted to the master tag. So, users can enter a synonym tag when writing a question, but the master tag will be displayed when the question is loaded. Editing a question tagged with a synonym tag causes it to be replaced by the master tag. Similarly, when users search for questions tagged with a synonym, a list of questions tagged with the master will be displayed.

      And

      • Users with at least 2,500 reputation (1,250 on beta sites) and a total answer score of 5 or more on a given tag may suggest synonyms for that tag.
      • Users with a net answer score (total upvotes minus downvotes) of 5 or more for a given tag may vote for synonyms for that tag.
        Suggestions will be automatically approved when they reach a score of four, and automatically deleted when they reach a score of negative two.

      Additionally, moderators can create synonyms as needed on all tags and have their suggestions instantly approved.

      6 votes
    3. [2]
      synergy-unsterile
      Link Parent
      I type all my tags manually in a text editor (thereby not getting any tag suggestions) and paste them into the tag field because there is no way to change the order of the tags if I type them in...

      I type all my tags manually in a text editor (thereby not getting any tag suggestions) and paste them into the tag field because there is no way to change the order of the tags if I type them in the field.

      4 votes
    4. [5]
      Algernon_Asimov
      Link Parent
      Actually, I've been treating "poetry" and "poem(s)" as two separate tags: "Poetry" is the broader category. It includes examples of specific poems, but also includes discussions about writing...

      Actually, I've been treating "poetry" and "poem(s)" as two separate tags:

      • "Poetry" is the broader category. It includes examples of specific poems, but also includes discussions about writing poems, and analysing poems, and poets themselves.

      • "Poem" is a sub-category, where someone has posted an actual poem.

      I don't believe these two tags are synonymous. There are situations where both tags should be applied to a topic.

      FYI: @Elronnd

      1 vote
      1. [3]
        Elronnd
        Link Parent
        Well sure, you can make that argument and that's valid; this was just an example. And I don't think even the staunchest pedants would argue that 'poem' and 'poems' should be distinct.

        Well sure, you can make that argument and that's valid; this was just an example. And I don't think even the staunchest pedants would argue that 'poem' and 'poems' should be distinct.

        2 votes
        1. ThatFanficGuy
          Link Parent
          You can look at the distinction between "poem" and "poems" as a matter of user's perspective. One way to look at tags is as a way of adding the item you're posting to a category of things that...

          You can look at the distinction between "poem" and "poems" as a matter of user's perspective.

          One way to look at tags is as a way of adding the item you're posting to a category of things that already exist. So, when you post a poem and add the tag "poems", plural, you add the topic to the list of topics that are also categorized not as, but under "poems".

          Another would be as assigning a category to the item. When you post a poem, in this case, it only makes sense that you'd call it a "poem", and so this is the tag you add.

          It's systemizing vs. describing. Neither is right or wrong, so it's only a matter of the approach Tildes, as a forum software, takes. In either case, it would have some importance to explain to users the difference and which one Tildes chooses.

          2 votes
        2. Algernon_Asimov
          Link Parent
          You're right: even as a staunch pedant myself, I would not argue this. (I did not argue this.) In fact, I've been cleaning up some synonymous tags over the past few weeks since @Deimos implemented...

          I don't think even the staunchest pedants would argue that 'poem' and 'poems' should be distinct.

          You're right: even as a staunch pedant myself, I would not argue this. (I did not argue this.)

          In fact, I've been cleaning up some synonymous tags over the past few weeks since @Deimos implemented the autocomplete feature for tags, with the aim of removing some alternate versions of tags from the suggestions list, so that only one version will be suggested.

          The problem is, as I found out yesterday, some people don't use the autocomplete feature. Some people just type tags without checking what gets suggested. And this problem will never ever ever stop. There will always be people typing non-standard tags.

          This is why we have Tag Editors ™: people who can add/edit/remove tags on a topic and standardise them. :)

          2 votes
      2. Adys
        Link Parent
        @Deimos yeah about that specific tag I agree with @Algernon_Asimov. I'm actually filtering out poem but not poetry. But I agree with the body of your post.

        @Deimos yeah about that specific tag I agree with @Algernon_Asimov. I'm actually filtering out poem but not poetry.

        But I agree with the body of your post.

        2 votes
  2. [2]
    alyaza
    Link
    call it a bit of a uh... recurring issue, due to the nature of the tagging system. it's a bit of a mess because we don't really have a style guide or something to guide the process, it's just...

    call it a bit of a uh... recurring issue, due to the nature of the tagging system. it's a bit of a mess because we don't really have a style guide or something to guide the process, it's just whatever people intuitively feel should be tagged + what mods tag it as if they have anything to add or fix. this usually stabilizes, but until it does you get a mess like poem/poems/poetry where synonymous tags aren't really treated that way, either by the site or by the users, and unless someone goes back and fixes all the tags after the fact they just stand that way perpetually.

    unless we automate the process at some point in the very future--which we might, i dunno--i think we're probably going to have to hash out at least some sort of standard at some point to guide the tags instead of just letting it be whatever the community ends up predominantly using. this is partly because i think that just won't work as well as we get larger, but also because the poem/poems/poetry case is just one dimension of potential issues; we also could, in the future, have issues based on things like english regional spellings (labor vs labour, defense vs defence, etc.) since that's just kind of a thing english has. (i think informally, we're supposed to use american english spellings, but for obvious reasons if that is the informal standard, it hasn't stuck that well.) we're also probably going to have to figure out some other stuff like what the use of subtags are since they're kinda inconsistently applied, and what sort of hierarchy we want to even try and enforce with shit like the social sciences over in ~humanities.

    4 votes
  3. [5]
    Dovey
    Link
    Y'all are certainly obsessed with tags. I have been thinking for a while that I never want to start a topic on Tildes because there is so much focus on getting the tags just perfect. Actually I...

    Y'all are certainly obsessed with tags. I have been thinking for a while that I never want to start a topic on Tildes because there is so much focus on getting the tags just perfect. Actually I guess there's no pressure for me because no matter what I choose, the tag police are going to arrive shortly and fix it up to suit themselves. Maybe I shouldn't waste time thinking about it, but I'm used to being familiar with the way a forum works, you know? It feels like too much work to figure out how to "do it right" here.

    4 votes
    1. hungariantoast
      Link Parent
      You're right, there should be no pressure to post topics and add tags to it. At worst, just don't tag the topic if you have no clue what to add, but trying to add something is always a good idea....

      You're right, there should be no pressure to post topics and add tags to it. At worst, just don't tag the topic if you have no clue what to add, but trying to add something is always a good idea.

      And, if one of the tags you added gets removed or changed or a new tag gets added to your topic then, if you notice these changes being made by a moderator, just pay attention and remember them the next time you post something.

      It's not like users should be deathly afraid of getting things wrong. It isn't the users' responsibility or burden to make sure tags are perfect, it's the moderators.

      So don't go wild and obviously tag things incorrectly, but don't feel like you need to be perfect about it either.

      Unfortunately, because of how tags relate to searching and how they'll eventually relate to subgroup hierarchies, they're going to be an integral part of how Tildes works, which means they are important (and powerful) and there will need to be some guidelines on how to use them, including in specific cases. Ideally though, these guidelines won't introduce complications, but rather, they'll solve them.

      4 votes
    2. [3]
      patience_limited
      (edited )
      Link Parent
      Seconded. There are frequent occasions when I'm posting on topics for which there's no autocomplete suggestion. It feels as if I'm spending more time parsing the style guide and attempting to...

      Seconded. There are frequent occasions when I'm posting on topics for which there's no autocomplete suggestion. It feels as if I'm spending more time parsing the style guide and attempting to correctly taxonomize the topic than posting and commenting.

      Can we at least see the journal history to determine what correction was made, or perhaps a comment visible to the OP on the reasoning behind the change?

      I realize this creates more work for tag editors, but if the intent is that topics be genuinely keyword filtered and searchable, the education needs to be mutual.

      In the absence of standards like the Library of Congress or Usenet-style hierarchy or other formal taxonomy, tagging ought to evolve towards something that makes fuzzy search and filtering feasible, where "poem", "poems", and "poetry" provide results close to the expectations of users in other environments.

      I have a preference for building out the originally proposed nested subgroups, e.g. "biology.bacteria.synthetic", but the tag could just as easily be "engineered lifeforms", and there will always be questions about what meaning people are searching for.

      3 votes
      1. [2]
        Algernon_Asimov
        Link Parent
        There is a Topic Log in the sidebar of every topic which lists all changes made to that topic - including changes to its tags. It's an automated feature, so it's no extra work for the tag editors....

        Can we at least see the journal history to determine what correction was made,

        There is a Topic Log in the sidebar of every topic which lists all changes made to that topic - including changes to its tags. It's an automated feature, so it's no extra work for the tag editors.

        or perhaps a comment visible to the OP on the reasoning behind the change?

        I change between 5 and 20 tags most days. And sometimes I'll have a clean-up session where I'll fix one particular tag on a few dozen historical topics in a group. It's not practical for me to leave a comment every time I change a tag.

        5 votes
        1. patience_limited
          Link Parent
          Thanks for pointing out the Topic Log - I don't live here full-time (in the midst of a cross-country relocation), and wasn't aware.

          Thanks for pointing out the Topic Log - I don't live here full-time (in the midst of a cross-country relocation), and wasn't aware.

          4 votes
  4. [12]
    Micycle_the_Bichael
    Link
    I'm not sure how feasible this is to scale because (like most language-based things) its going to get dicey with how close is virtually the same? Should the tag "haku" be rolled up into poetry?...

    I'm not sure how feasible this is to scale because (like most language-based things) its going to get dicey with how close is virtually the same? Should the tag "haku" be rolled up into poetry? Maybe! It's a type of poetry! How about a set of tags like "short story" and story? Assumedly, if I'm looking for a story I would also like short stories, so this should apply. But what if I only want short stories? Does each post then get a display tag that is its "rolled-up tag" ("rolled-up" because its all the other tags rolled up into one) that users see and then in the backend have a list of the actual tag that was given to it? Who gets final say in a tag? What if I know that a certain tag exists but I don't think my topic fits that tag, but either the moderator the automated system for rolling up tags disagrees? Who trumps? The person making the post and who has read the content or the system that is applying vague rules or their best guess? I'm not saying these ideas are insurmountable. But they are a lot of questions to think about. And this all is independent of the always important question of "who does the work to merge tags"? A human? They'd have to go through every post and check all the tags and compare them to an insanely long list and decide if it fits an existing tag or should be it's own. Machine Learning? Good luck. We were working on NLP roll-up at my last company, its insanely hard to tweak to be right, computationally it is very expensive, and takes forever. I don't think it would ever work on a website level.

    1. [11]
      alyaza
      Link Parent
      your first example feels like a use-case for the subtags system. personally, if i was going to tag a haiku, i think i'd go with poetry.haiku since it's entirely subsumed by the concept of poetry,...

      I'm not sure how feasible this is to scale because (like most language-based things) its going to get dicey with how close is virtually the same? Should the tag "haku" be rolled up into poetry? Maybe! It's a type of poetry! How about a set of tags like "short story" and story? Assumedly, if I'm looking for a story I would also like short stories, so this should apply. But what if I only want short stories? Does each post then get a display tag that is its "rolled-up tag" ("rolled-up" because its all the other tags rolled up into one) that users see and then in the backend have a list of the actual tag that was given to it?

      your first example feels like a use-case for the subtags system. personally, if i was going to tag a haiku, i think i'd go with poetry.haiku since it's entirely subsumed by the concept of poetry, but also haiku is a specific, distinct, and quite recognizable form of poetry to most people. we've only had one actual example of the haiku tag in that context on this website, though (and it uses standalone haiku), so we don't really seem to have a standard there.

      as for the other thing, both story and short story are tags we currently use and since there's no real way to roll them up neatly like with the haiku example and there's not really a good tagging category which contains them both, i would imagine that they'd be their own tags. this is apparently the standard we've settled on thus far for both; we generally use story for stories. weirdly, though, we seem to use short stories for short stories.

      6 votes
      1. [10]
        Micycle_the_Bichael
        Link Parent
        Yeah, I'm not saying we can't solve the examples I give. But is that a feasible long-term solution? Are the site mods going to sit down and comb through an ever-growing list of tags and as a group...

        Yeah, I'm not saying we can't solve the examples I give. But is that a feasible long-term solution? Are the site mods going to sit down and comb through an ever-growing list of tags and as a group come to a consensus on what tags are grouped together and what the main display tag for that group is? Because every mod will have to be on the same page for that to work, and it still requires mods to go through each post and have an encyclopedic knowledge of the current tags groupings and have them re-tag posts appropriately?

        2 votes
        1. [8]
          alyaza
          Link Parent
          i mean no, which is kinda my point upthread. the way i see it, we're most likely either going to have to eventually have some sort of tag style guide (and probably, a lot more people who...

          i mean no, which is kinda my point upthread. the way i see it, we're most likely either going to have to eventually have some sort of tag style guide (and probably, a lot more people who explicitly have the power to curate tags to keep order), or we're going to end up beginning to automate it in some way so it's largely not a user issue in the first place. i don't see the community being able to keep everything super straight just with self-policing and mod intervention from time to time like we currently do, especially as the site gets larger--i mean, we can't really keep things super straight now even though people really try to, lol.

          3 votes
          1. [7]
            Deimos
            Link Parent
            There is already a bit of a style guide. The Tagging help page is always linked above the tagging field, and it includes some guidelines (and I can add more). For example, it specifies that tags...

            There is already a bit of a style guide. The Tagging help page is always linked above the tagging field, and it includes some guidelines (and I can add more). For example, it specifies that tags should generally be plural, which means that from your previous example here, all of those "story" tags should probably get converted to "stories".

            3 votes
            1. [6]
              alyaza
              Link Parent
              kinda forgot about that, actually (although i was also thinking something a bit larger with 'style guide'). as far as adding guidelines, i'm thinking you should probably expound more on how the...

              kinda forgot about that, actually (although i was also thinking something a bit larger with 'style guide').

              as far as adding guidelines, i'm thinking you should probably expound more on how the hierarchical tags work and demonstrate some of the more recent conventions for using them on there, since they're basically regular features in ~humanities now and they're also quite common on ~news and ~music. also feels like you might want to specify a standard of english on there for the tagging system to avoid the regionalisms problem i mentioned upthread. it's a relatively minor issue now (there are only a few regionalisms i can think of that have double tags like labor / labour), but probably better to sort this out while it's a minor issue than let it sit, because people are kinda weird about regionalisms in my experience.

              3 votes
              1. [5]
                Algernon_Asimov
                Link Parent
                Like this? I've asked @Deimos if there's any way of incorporating my "how to" guide into the official Tildes ecosystem, but I'm not holding my breath. It's a low priority. I vote "World English"!...

                i'm thinking you should probably expound more on how the hierarchical tags work

                Like this?

                I've asked @Deimos if there's any way of incorporating my "how to" guide into the official Tildes ecosystem, but I'm not holding my breath. It's a low priority.

                also feels like you might want to specify a standard of english on there for the tagging system to avoid the regionalisms problem i mentioned upthread.

                I vote "World English"! American English is the outlier. (And herein starts the argument...)

                There has previously been talk about Tildes treating alternate spellings as the same word - so that all topics tagged "labor" or "labour" would both be included whenever someone filters/searches for either "labour" or "labor". I think that's a better approach.

                3 votes
                1. [4]
                  alyaza
                  Link Parent
                  yeah, probably, although even that we might want to expand to include more examples (like how we tag religions since that might not be immediately obvious to people). the official page gives a few...

                  Like this?

                  yeah, probably, although even that we might want to expand to include more examples (like how we tag religions since that might not be immediately obvious to people). the official page gives a few examples but doesn't really linger on them other than that they exist and that they're used in ask tags and for countries with polities sometimes, but we are and have been crafting more novel uses of them on a not-infrequent basis for awhile now.

                  1 vote
                  1. [3]
                    Algernon_Asimov
                    Link Parent
                    The “how to” guide is only intended to explain the principles behind tags and how to use them, rather than provide a comprehensive list of all tags that have been used. If you're motivated enough,...

                    although even that we might want to expand to include more examples

                    The “how to” guide is only intended to explain the principles behind tags and how to use them, rather than provide a comprehensive list of all tags that have been used.

                    If you're motivated enough, you could create and maintain such a list. It could be a separate page on that wiki, and the tag section of the “how to” guide could include a link to it. Of course, maintaining that list would be an ongoing task, to capture all the more novel uses of sub-tags we'll keep crafting, but if you think it would be helpful, it wouldn't be wasted effort.

                    2 votes
                    1. [2]
                      alyaza
                      Link Parent
                      i'll probably do that. i have pretty much endless free time after today and actually parsing out all the use cases we currently have for them might be helpful in establishing a mostly-objective...

                      If you're motivated enough, you could create and maintain such a list. It could be a separate page on that wiki, and the tag section of the “how to” guide could include a link to it. Of course, maintaining that list would be an ongoing task, to capture all the more novel uses of sub-tags we'll keep crafting, but if you think it would be helpful, it wouldn't be wasted effort.

                      i'll probably do that. i have pretty much endless free time after today and actually parsing out all the use cases we currently have for them might be helpful in establishing a mostly-objective standard for future use cases, since right now it feels mildly arbitrary where we do and don't use them and i kinda doubt everybody is using them the same way, which doesn't seem optimal.

                      2 votes
                      1. Algernon_Asimov
                        Link Parent
                        Get in touch with @deing. He'll set you up with access, and create a page for you, and get you started.

                        Get in touch with @deing. He'll set you up with access, and create a page for you, and get you started.

                        2 votes
        2. hungariantoast
          Link Parent
          Not just the moderators. Everyone. I'm very doubtful that the tagging system could be efficiently curated and evolved if only moderators were allowed a hand in the process. When there are...

          Are the site mods going to sit down and comb through an ever-growing list of tags and as a group come to a consensus on what tags are grouped together and what the main display tag for that group is?

          Not just the moderators. Everyone.

          I'm very doubtful that the tagging system could be efficiently curated and evolved if only moderators were allowed a hand in the process. When there are proposals, voting, and other pipeline processes, they most likely need to involve regular users as well.

          Also, documentation, probably a wiki, will be necessary to keep people familiar with the rules. General moderation can sometimes suffer under very specific rules (as a lot of today's conversation talked about), but tagging systems thrive when there are clear cut rules. So, if there are tagging difference for muggle between ~creative and ~hobbies, that'll be explained on the wiki page for the muggle tag and moderators should be familiar with those rules when interacting with that tag. The information from this theoretical wiki can also be built into the site's UI somewhere, I'm sure. (muggle is probably a bad example, most tags shouldn't have their own wiki page, that would be ridiculous, and referencing a wiki shouldn't be necessary for most tag edits.)

          The trick here is to promote collaborative moderation and curation where users focus on specialized areas of interest, kind of like how Wikipedia or OpenStreetMap contributors tend to focus on only a few specific interests and become proficient with them. However, we really ought to try and avoid the bureaucracy that plagues Wikipedia.

          3 votes