26 votes

How many valid JSON strings are there?

18 comments

  1. [12]
    Crespyl
    Link
    It took me a minute to realize the author was not talking about the values of strings as defined in the JSON spec (anything inside of "..." pairs), but "JSON blobs encoded as strings" also known...

    It took me a minute to realize the author was not talking about the values of strings as defined in the JSON spec (anything inside of "..." pairs), but "JSON blobs encoded as strings" also known as JSON documents.

    The article explores the number of valid JSON documents of several (small) sizes.

    Neat!

    8 votes
    1. [11]
      teaearlgraycold
      (edited )
      Link Parent
      Edit: am wrong Acktshually, a JSON document is something else. In the article the author says: A JSON primitive such as "null" is not a JSON document. A document must be either an array or object...

      Edit: am wrong


      also known as JSON documents

      Acktshually, a JSON document is something else. In the article the author says:

      Some valid JSON strings for N = 4 would be 'null', 'true', '1234', '"ab"' and '[{}]'

      A JSON primitive such as "null" is not a JSON document. A document must be either an array or object at the top level.

      4 votes
      1. [8]
        Johz
        Link Parent
        I'm not sure that's true. According to the spec, a JSON document is an element, which is a value surrounded by optional whitespace, and value can be a primitive, an array, or an object. Unless you...

        I'm not sure that's true. According to the spec, a JSON document is an element, which is a value surrounded by optional whitespace, and value can be a primitive, an array, or an object.

        Unless you are using "JSON document" in a very specific way here, but I've not come across it before like that.

        9 votes
        1. balooga
          Link Parent
          My takeaway was that the author considers it a valid string if JSON.parse() accepts it. Which it does with primitives like “null”.

          My takeaway was that the author considers it a valid string if JSON.parse() accepts it. Which it does with primitives like “null”.

          6 votes
        2. [2]
          teaearlgraycold
          Link Parent
          Oh no. Well I thought I’d read that somewhere.

          Oh no. Well I thought I’d read that somewhere.

          2 votes
          1. Johz
            Link Parent
            I think some parsers only accept objects or arrays (because 99% of the time that's what you want), and some formats like Toml that are essentially based on the JSON data structure might only allow...

            I think some parsers only accept objects or arrays (because 99% of the time that's what you want), and some formats like Toml that are essentially based on the JSON data structure might only allow objects at the top level, but spec JSON is very flexible.

            2 votes
        3. [4]
          smores
          Link Parent
          Wait, is this true? I don't see the word document anywhere in that spec. I agree that a JSON value can be a primitive, array, or object, but I've mostly seen a JSON "document" referred in the...

          Wait, is this true? I don't see the word document anywhere in that spec. I agree that a JSON value can be a primitive, array, or object, but I've mostly seen a JSON "document" referred in the context of either a document store or a JSON schema document. In both of those cases, a document is specifically an object.

          I wouldn't be surprised if there simply is no commonly accepted definition of JSON document, but I don't think it's uncommon to use it to refer to a JSON-serializable object!

          1 vote
          1. [2]
            zestier
            (edited )
            Link Parent
            You made me curious so I started poking around more. I cannot find any evidence that "JSON Document" has an official definition. The closest I could find was a JSON Schema post saying that it uses...

            You made me curious so I started poking around more. I cannot find any evidence that "JSON Document" has an official definition. The closest I could find was a JSON Schema post saying that it uses "JSON Document", "JSON data", "JSON text", and "JSON value" interchangeably, but that's obviously outside the JSON spec and actually implies that it lacks a definition (otherwise why specify this?).

            On a personal level, I would define a JSON Document as any content that could correctly use "application/json" as the content type.

            3 votes
            1. [2]
              Comment deleted by author
              Link Parent
              1. zestier
                (edited )
                Link Parent
                This has an RFC to answer how to correctly use it: https://datatracker.ietf.org/doc/rfc8259/. Technically it has multiple RFCs. Over the decades they have been superseded a couple times. From a...

                This has an RFC to answer how to correctly use it: https://datatracker.ietf.org/doc/rfc8259/.

                Technically it has multiple RFCs. Over the decades they have been superseded a couple times. From a skim it looks like it's been over a decade since a conforming parser has been allowed to reject non-object/array root values. This amusingly means that everyone in this thread may be using the same definition, but just from different times. Prior to 2014 the only two starting characters required to be supported were { and [.

                5 votes
          2. Johz
            Link Parent
            I would think of a JSON document as being a document (i.e. file) in JSON format, in which case anything goes according to the spec, but you make a good point about document stores typically having...

            I would think of a JSON document as being a document (i.e. file) in JSON format, in which case anything goes according to the spec, but you make a good point about document stores typically having more restrictions on what's allowed.

            2 votes
      2. [2]
        lily
        Link Parent
        I think this might have changed at some point. From what I remember when I wrote my own JSON parser a few months ago, the latest JSON specification doesn't restrict the type of top-level values;...

        I think this might have changed at some point. From what I remember when I wrote my own JSON parser a few months ago, the latest JSON specification doesn't restrict the type of top-level values; it only mandates that there be only one. But I know some older JSON parsers only allow objects and arrays to be top-level. It's possible older versions of the spec had such a restriction, but it was removed? (To be fair, there's really not much of a point to a JSON document containing only a single string or number...)

        Honestly, the JSON specification has always been a little wishy-washy. The fact that JSON parsers are allowed to parse anything they want, as long as valid JSON documents still work correctly, feels a little weird to me. It means there are a lot of slightly different, incompatible versions of JSON out there.

        3 votes
        1. zestier
          (edited )
          Link Parent
          Technically a separate spec, but is present on https://datatracker.ietf.org/doc/rfc7159/ from 2014. So a confirmation that something closely related did change.

          Technically a separate spec, but

          A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names.
          A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.
          JSON-text = ws value ws

          is present on https://datatracker.ietf.org/doc/rfc7159/ from 2014. So a confirmation that something closely related did change.

          2 votes
  2. [5]
    Notcoffeetable
    Link
    As I was reading I was struggling to figure out why someone would be interested in counting these. Obviously "because it's there" is always a reasonable answer to such questions. But this is a...

    As I was reading I was struggling to figure out why someone would be interested in counting these. Obviously "because it's there" is always a reasonable answer to such questions. But this is a hard enough problem that usually you'd want some justification for this effort.

    • I think it is a great example of how we think about begin counting "uncountably finite" types of things.
    • One would want to do exactly this type of analysis when building a hash function of sorts.

    Any other ideas how one would use this information?

    4 votes
    1. Raspcoffee
      Link Parent
      Personally, I like these sort of 'thought experiments' with lack of better term because of how absurd the scale of these are. A bit like the amount of possible Go games, or solvable Sudoku's, it's...

      Personally, I like these sort of 'thought experiments' with lack of better term because of how absurd the scale of these are. A bit like the amount of possible Go games, or solvable Sudoku's, it's absurd. I like statistical mechanics when I was still a student of physics for that reason too, which is one of those fields that are always even more difficult than you think. Even if you think you got it nailed down there are always ways things can get even more complicated, fast.

      And this then kind of shows another reason someone might want to do things like this: you learn how to approach problems like that. Even getting a close estimate can be useful if you need to tackle a problem in order to get a sense of scale should you work with computing, thermodynamics, or something else that deals with a lot of possibilities.

      4 votes
    2. [2]
      cutmetal
      Link Parent
      Absurd idea: theoretically you could use this work to build a json validator for input strings below a certain length, by generating every valid string and either storing them or hashes of them in...

      Absurd idea: theoretically you could use this work to build a json validator for input strings below a certain length, by generating every valid string and either storing them or hashes of them in a lookup table. The validator would have a ridiculously low computational complexity, though the real-world speed would probably be poor due to the enormous size of the table, and it may take an unrealistic amount of memory to run. But there could be limited useful applications for this, especially if you were able to restrict the input to ascii.

      3 votes
      1. balooga
        Link Parent
        That is an absurd idea, lol. I really don’t think there are any practical use cases for this, but that’s okay. It’s just a fun thing to think about. People, or at least some people with certain...

        That is an absurd idea, lol. I really don’t think there are any practical use cases for this, but that’s okay. It’s just a fun thing to think about. People, or at least some people with certain mathematical inclinations, like to think about navigating impossibly large but finite possibility spaces. It’s a reminder that all the answers to all of the mysteries of the universe are just there, waiting to be found. It doesn’t have to have real world applications, it’s brain candy.

        4 votes
    3. vektor
      Link Parent
      Testing how close to ideal your domain-specific compression algorithm is?

      Testing how close to ideal your domain-specific compression algorithm is?

      2 votes
  3. [2]
    Comment deleted by author
    Link
    1. zestier
      Link Parent
      The post focuses on the number of valid strings for each length. I initially had a similar thought, but the author takes a more reasonable approach to answer specifically bounded scenarios more...

      The post focuses on the number of valid strings for each length. I initially had a similar thought, but the author takes a more reasonable approach to answer specifically bounded scenarios more like "How many 100 character strings can be parsed as valid JSON?"

      1 vote