40 votes

XML is better than YAML. Hear me out...

Tags: xml, yaml

18 comments

  1. [2]
    spit-evil-olive-tips
    Link
    I think he's correct. no matter how bad you consider XML to be, YAML is worse. if you're not already familiar with it, read about the Norway problem countries: - GB - IE - FR - DE - NO >>> from...
    • Exemplary

    My spicy take is that XML is better than YAML, because there are situations where XML is appropriate, but there’s no situation where YAML is appropriate.

    I think he's correct. no matter how bad you consider XML to be, YAML is worse.

    if you're not already familiar with it, read about the Norway problem

    countries:
    - GB
    - IE
    - FR
    - DE
    - NO
    
    >>> from pyyaml import load
    >>> load(the_configuration)
    {'countries': ['GB', 'IE', 'FR', 'DE', False]}
    

    inexcusable.

    YAML is full of these design "accidents" - things where it works the way it works because that's the way it works, rather than because anyone decided that was the ideal way for it to work.

    if you need human-readability, you want TOML. if you want machine-readability and -writability, you want JSON (at a bare minimum) or a similar data-interchange format (of which there are a billion)

    YAML tries to be this weird middle ground where it can be used for human-written config files, but is also a superset of JSON so that you can easily have programs render "YAML" (by generating JSON).

    28 votes
    1. secret_online
      Link Parent
      Note that the Norway problem was addressed in YAML 1.2, released in 2009. The core schema (the recommended mode to parse a document) defines a boolean to be true|True|TRUE|false|False|FALSE,...

      Note that the Norway problem was addressed in YAML 1.2, released in 2009. The core schema (the recommended mode to parse a document) defines a boolean to be true|True|TRUE|false|False|FALSE, instead of YAML 1.1's y|Y|yes|Yes|YES|n|N|no|No|NO|true|True|TRUE|false|False|FALSE|on|On|ON|off|Off|OFF. But without any versioning in the spec† there's no way to tell whether an application uses a YAML 1.1 or 1.2 parser, so it's always safer to just.. use YAML 1.1 and deal with the quoting all the time.

      For me the worst thing is that a YAML file that has been chopped in half (whether due to errors in transmission or writing to disk) is still a valid YAML file. While writing braces for your JSON gets a bit tedious, at least you can do a bracket balancing check to ensure you have the whole document.

      †: That's probably a good thing. Let's not have another DOCTYPE situation on our hands.

      17 votes
  2. Amun
    Link
    Carl Johnson There's always something better than YAML Let’s talk about XML first YAML on the other hand Better choices Also…

    Carl Johnson


    There's always something better than YAML


    My spicy take is that XML is better than YAML, because there are situations where XML is appropriate, but there’s no situation where YAML is appropriate.

    Johnny Boursiquot: Dang. Shots fired!

    Let’s talk about XML first

    So, XML got a really bad reputation. The reason it got a really bad reputation is because people were using it for things that it should never have been used for. XML stands for Extensible Markup Language. And if you’re using it as an extensible markup language, it’s actually really great.

    Let’s say you’re working on something and you’re like:

    “I’m making a new kind of book, and I need to annotate all of the verses in the Bible, and have the chapter headings and stuff.”

    It’s great for that.

    It’s really good for when you have a document, and some things are italicized, and some things are in a foreign language, and subtitles, and all that stuff. It’s really good for that. It’s not good if it’s like:

    “I need to configure this server and the server needs to know if this value is true or false.”

    No, that’s bad. Don’t do that. That’s not a good use for XML.

    But XML does have some uses for which it’s appropriate. And I think the fact that now everybody is writing React… and they’re writing it with JSX… and JSX is basically just inline XML… I think that shows that there are cases where actually XML is pretty good.

    YAML on the other hand

    YAML on the other hand I think is never good. There’s always something better than YAML!

    We’ve finally made it to Go 1.21 and I’m very happy about this. And you know why? Because YAML repeatedly bit me in the behind on Go 1.20. Ask me why.

    Johnny Boursiquot: Why?

    Well, when you say in your test file:

    “test this against Go 1.20”

    It interprets that as Go 1.2.

    So it was like two or three different times I had a repo where I’m like:

    “These tests aren’t passing. Why are my tests not passing?! I’ve just upgraded to Go 1.20.”

    And it was like:

    “Oh, no. I’ve decided that this is Go 1.2”

    And so YAML is always going to do that to you…

    Better choices

    Now, I understand why people do YAML, but there are better choices. You can use TOML. You can use CUE. You can use lots of things.

    I like how Caddy does it. With Caddy, the canonical language that Caddy understands is JSON, but they have adapters.

    So if you want to write YAML, you can. But it’ll just take that YAML and turn it into JSON behind the scenes. Then they also have a specific Caddy language. So you can give it the Caddy language and then it turns that into JSON behind the scenes. And you can give it an NGINX config and it’ll turn that into JSON behind the scenes. If you have the cycles and time to spare, that’s probably the best solution for most people…

    But when the only way to do it is YAML, it’s like:

    “No…!”

    It’s just too error-prone, there’s too many things… You just have to always quote everything.

    Also…

    Also the YAML specification has all these features that nobody ever uses, because they’re really confusing, and hard, and you can include documents inside of other documents, with references and stuff… And it’s like

    “Oh, man, I don’t want to have to understand any of this stuff.”

    And then if you’re not careful, it’d be like:

    “Oh yeah, the first three things I fed it to understood that, but then the third one was just like using some parser that they found on the back of a truck, and it doesn’t understand it”

    and it’s like:

    “Oh, man…”

    YAML… never appropriate. XML is sometimes appropriate.

    9 votes
  3. [8]
    devilized
    Link
    I don't know, I think they both have their place. XML is definitely more rich in features, and is indeed more extensible. But I'd hate to have to generate it manually. Meanwhile, YAML is quite...

    I don't know, I think they both have their place. XML is definitely more rich in features, and is indeed more extensible. But I'd hate to have to generate it manually. Meanwhile, YAML is quite human readable and writable. If I'm going to generate and ingest something using a markup language, XML makes sense. But if I'm going to have to generate it by hand, or even heavily edit it, I'd just prefer YAML for most things where that's applicable.

    9 votes
    1. [7]
      crius
      Link Parent
      Let's be real please. In 2023, with any IDE and autocompletion, templates and plugins, if you are working configuration comes by hand one line at a time you are "doing it wrong".

      Let's be real please. In 2023, with any IDE and autocompletion, templates and plugins, if you are working configuration comes by hand one line at a time you are "doing it wrong".

      4 votes
      1. [3]
        cutmetal
        (edited )
        Link Parent
        Don't be ridiculous. Maybe nothing you work on ever requires manual line-editing of config files, but have you never like... worked on a small project? Or modified your local system? ETA: The...

        Don't be ridiculous. Maybe nothing you work on ever requires manual line-editing of config files, but have you never like... worked on a small project? Or modified your local system?

        ETA: The original comment I replied to:

        Let's be real please. In 2023, with any IDE and autocompletion, templates and plugins, if you are working configuration comes by hand one line at a time you are "doing it wrong".

        16 votes
        1. [2]
          crius
          Link Parent
          That is absolutely not what I said nor implied. What I mean is that usually you start from a template or an example and just do some adjustment. Rarely, if ever, you write a configuration file...

          That is absolutely not what I said nor implied.

          What I mean is that usually you start from a template or an example and just do some adjustment.

          Rarely, if ever, you write a configuration file from scratch.

          1 vote
          1. cutmetal
            Link Parent
            Sorry to have misinterpreted you. But you might want to read your original comment again, it wasn't clear that that's what you meant.

            Sorry to have misinterpreted you. But you might want to read your original comment again, it wasn't clear that that's what you meant.

            4 votes
      2. mxuribe
        Link Parent
        Yes, but when i read that in my feeble mind, it comes across like, well, "...In a world where we now have drills and drill bits, we never need to use a screwdriver!" ...Which i know you're not...

        Yes, but when i read that in my feeble mind, it comes across like, well, "...In a world where we now have drills and drill bits, we never need to use a screwdriver!" ...Which i know you're not saying exactly...but that's how interpret your statement. And, of course, anyone who has the use-case for a drill and drill bits knows very well that we can NOT complete projects without both drills AND screwdrivers. With all apologies. ;-)

        6 votes
      3. [2]
        adorac
        Link Parent
        It may just be me, but even with autocompletion, typing <brackets> is a much bigger pain than {braces} or YAML.

        It may just be me, but even with autocompletion, typing <brackets> is a much bigger pain than {braces} or YAML.

        3 votes
        1. RheingoldRiver
          Link Parent
          Huh, you just made me realize that I use my...index finger if I need to type an angle bracket. I completely move my entire right hand and use my right index finger to type it. I wonder what caused...

          Huh, you just made me realize that I use my...index finger if I need to type an angle bracket. I completely move my entire right hand and use my right index finger to type it. I wonder what caused me to develop this habit, this is terrible.

          1 vote
  4. [2]
    ColtonE
    Link
    I agree that XML is a more feature full and appropriate tool for schema driven APIs but bro really wrote a whole blog post because he didn't quote his number to a string and it was treated like a...

    I agree that XML is a more feature full and appropriate tool for schema driven APIs but bro really wrote a whole blog post because he didn't quote his number to a string and it was treated like a number. That's how numbers work dog.

    7 votes
    1. spit-evil-olive-tips
      Link Parent
      the example he uses is Golang version 1.20 being incorrectly parsed as 1.2. let's look at the version history and use a couple examples: >>> s = ''' ... versions: ... - 1.19 ... - 1.19.13 ... -...

      bro really wrote a whole blog post because he didn't quote his number to a string and it was treated like a number. That's how numbers work dog.

      the example he uses is Golang version 1.20 being incorrectly parsed as 1.2. let's look at the version history and use a couple examples:

      >>> s = '''
      ... versions:
      ...   - 1.19
      ...   - 1.19.13
      ...   - 1.20
      ...   - 1.20.1
      ...   - 1.21
      ... '''
      
      side note about what an incredible pile of shit the entire YAML ecosystem is

      you'd think you could just do this, right? nope.

      >>> yaml.load(s)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: load() missing 1 required positional argument: 'Loader'
      

      following the official documentation, I have to do some needlessly complicated bullshit that the library should be able to handle as an implementation detail:

      try:
          from yaml import CLoader as Loader, CDumper as Dumper
      except ImportError:
          from yaml import Loader, Dumper
      

      and pass Loader=Loader to yaml.load for some godforsaken reason.

      >>> yaml.load(s, Loader=Loader)
      {'versions': [1.19, '1.19.13', 1.2, '1.20.1', 1.21]}
      

      this isn't simply a "they should quote their strings" problem. the problem is that YAML encourages these sorts of loosey-goosey type conversions (as typified by the Norway problem that I linked in another comment)

      if we look at TOML as an example instead:

      >>> t = '''
      ... versions = [
      ...   1.19
      ...   1.19.13
      ...   1.20
      ...   1.20.1
      ...   1.21
      ... ]
      ... '''
      >>> toml.loads(t)
      Traceback (most recent call last):
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 511, in loads
          ret = decoder.load_line(line, currentlevel, multikey,
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 778, in load_line
          value, vtype = self.load_value(pair[1], strictly_valid)
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 880, in load_value
          return (self.load_array(v), "array")
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 1026, in load_array
          nval, ntype = self.load_value(a[i])
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 912, in load_value
          v = float(v)
      ValueError: could not convert string to float: '1.19   1.19.13   1.20   1.20.1   1.21'
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/tmp/venv/lib/python3.10/site-packages/toml/decoder.py", line 514, in loads
          raise TomlDecodeError(str(err), original, pos)
      toml.decoder.TomlDecodeError: could not convert string to float: '1.19   1.19.13   1.20   1.20.1   1.21' (line 2 column 1 char 1)
      

      a well-designed language won't try to guess at what you mean the way YAML does. it'll throw an error rather than guessing and hoping that it got it right.

      10 votes
  5. [3]
    skybrian
    Link
    XML does have some weird bits around the edges. Parsers should just parse, not download files off the Internet as if they were a browser. (They are unlikely to be secure enough to be a browser.)...

    XML does have some weird bits around the edges. Parsers should just parse, not download files off the Internet as if they were a browser. (They are unlikely to be secure enough to be a browser.) Back when I was doing XML in Java, the popular XML parsers would download an XML schema from a URL of the sender's choice, unless you figured out how to configure them not to do that. It was quite the security footgun.

    XML namespaces are weird and wrong, too. Maybe sometimes you want to combine multiple schemas, but that should be done by building a bigger schema and the document author shouldn't need to worry about it. Instead, XML has the equivalent of include and import statements, as if it were a programming language, and ways to rename tags so they don't conflict.

    I don't agree that JSX implements XML. It implements tags and attributes and mixed content. Yes, there's a subset of XML that does that, too. Maybe a parser that just does that and doesn't claim to handle all of XML would be good?

    I do think JSX implements a rather nice template language. By contrast, in Flutter you build documents out of a pile of Dart constructor calls, which seems considerably more Lispy and less readable. The angle brackets and end tags seem to be important, somehow. Also, not too much nesting. Although CSS is a mess, style is still better handled by adding attributes that point to styles, not putting style nodes in the document tree.

    6 votes
    1. [2]
      fxgn
      Link Parent
      Interesting, the reason I like Flutter is because I consider those constructor calls to be significantly more readable and easier to write than JSX

      I do think JSX implements a rather nice template language. By contrast, in Flutter you build documents out of a pile of Dart constructor calls, which seems considerably more Lispy and less readable

      Interesting, the reason I like Flutter is because I consider those constructor calls to be significantly more readable and easier to write than JSX

      2 votes
      1. skybrian
        Link Parent
        I haven't done all that much Flutter programming yet, so perhaps it's what you're used to.

        I haven't done all that much Flutter programming yet, so perhaps it's what you're used to.

        1 vote
  6. Zyara
    Link
    At my workplace, we've been using hydra for quite a while now, so for now we're stuck with YAML. I have to say though, it's been growing on me; I honestly find it more readable than TOML nowadays.

    At my workplace, we've been using hydra for quite a while now, so for now we're stuck with YAML. I have to say though, it's been growing on me; I honestly find it more readable than TOML nowadays.

    1 vote
  7. prairir001
    Link
    I personally don't like yaml and overall agree with the take. That being said, I do understand why some people may like it, it's easy to write. Caddy server does it correctly. IIRC Hugo also does...

    I personally don't like yaml and overall agree with the take. That being said, I do understand why some people may like it, it's easy to write.

    Caddy server does it correctly. IIRC Hugo also does it. I have fought for hours with ansible over dumb things like white space and which characters can I use in my regex. I'd rather not do that again.

    1 vote