11 votes

Accentuation on tags? ("á", "ã", "é", "í", etc)

I was trying to add andré bazin as a tag on an article about the film theorist André Bazin, but the box became red and prevented me from submiting. So I had to use the incorrect andre bazin.

I suppose there's a very rational technical reason for not using accentuation on tags -- even so, I believe it would be useful to suport those characters, since many languages use them.

4 comments

  1. [2]
    hungariantoast
    Link
    Assuming for a second that the reason @Bauke pointed out no longer exists, does there exist an easy way to map accented characters to "regular" latin ones? In other words, how could we make it so...

    Assuming for a second that the reason @Bauke pointed out no longer exists, does there exist an easy way to map accented characters to "regular" latin ones?

    In other words, how could we make it so that searching for topics tagged andre bazin will also include topics tagged with the accented andré bazin version and vice versa?


    Honestly, supporting accented characters might be a problem worth solving later on if Tildes grows substantially in size and reach.

    5 votes
    1. stu2b50
      Link Parent
      Depends on your definition of "easy". Of course it can't be a bijective map, so you'll have to pick how "smart" you want it to be. Simplistic, but unlikely to confuse, solutions include simply...

      Depends on your definition of "easy". Of course it can't be a bijective map, so you'll have to pick how "smart" you want it to be. Simplistic, but unlikely to confuse, solutions include simply mapping non-ASCII characters to "?". On the other hand, something like unidecode, a Python library, tries to do as best a job at making replacements that make sense - but edge cases can get weird, and as the library docs mention, possibly even unintentionally offensive (although unlikely - it's a good solution in most cases).

      So it's certainly possible to have a surjective mapping. For something like supporting search, you could run your surjective mapping (i.e unidecode) on all tags going in, and all text going into topic searching. Of course, the person searching will have to fiddle with how the mapping changes out non-ASCII characters in edge cases.

      4 votes
  2. culturedleftfoot
    Link
    I've encountered this a few times, as well as with hyphenated names and terms.

    I've encountered this a few times, as well as with hyphenated names and terms.

    3 votes