It took me a minute to realize the author was not talking about the values of strings as defined in the JSON spec (anything inside of "..." pairs), but "JSON blobs encoded as strings" also known...
It took me a minute to realize the author was not talking about the values of strings as defined in the JSON spec (anything inside of "..." pairs), but "JSON blobs encoded as strings" also known as JSON documents.
The article explores the number of valid JSON documents of several (small) sizes.
Edit: am wrong Acktshually, a JSON document is something else. In the article the author says: A JSON primitive such as "null" is not a JSON document. A document must be either an array or object...
Edit: am wrong
also known as JSON documents
Acktshually, a JSON document is something else. In the article the author says:
Some valid JSON strings for N = 4 would be 'null', 'true', '1234', '"ab"' and '[{}]'
A JSON primitive such as "null" is not a JSON document. A document must be either an array or object at the top level.
I'm not sure that's true. According to the spec, a JSON document is an element, which is a value surrounded by optional whitespace, and value can be a primitive, an array, or an object. Unless you...
I'm not sure that's true. According to the spec, a JSON document is an element, which is a value surrounded by optional whitespace, and value can be a primitive, an array, or an object.
Unless you are using "JSON document" in a very specific way here, but I've not come across it before like that.
I think some parsers only accept objects or arrays (because 99% of the time that's what you want), and some formats like Toml that are essentially based on the JSON data structure might only allow...
I think some parsers only accept objects or arrays (because 99% of the time that's what you want), and some formats like Toml that are essentially based on the JSON data structure might only allow objects at the top level, but spec JSON is very flexible.
Wait, is this true? I don't see the word document anywhere in that spec. I agree that a JSON value can be a primitive, array, or object, but I've mostly seen a JSON "document" referred in the...
Wait, is this true? I don't see the word document anywhere in that spec. I agree that a JSON value can be a primitive, array, or object, but I've mostly seen a JSON "document" referred in the context of either a document store or a JSON schema document. In both of those cases, a document is specifically an object.
I wouldn't be surprised if there simply is no commonly accepted definition of JSON document, but I don't think it's uncommon to use it to refer to a JSON-serializable object!
You made me curious so I started poking around more. I cannot find any evidence that "JSON Document" has an official definition. The closest I could find was a JSON Schema post saying that it uses...
You made me curious so I started poking around more. I cannot find any evidence that "JSON Document" has an official definition. The closest I could find was a JSON Schema post saying that it uses "JSON Document", "JSON data", "JSON text", and "JSON value" interchangeably, but that's obviously outside the JSON spec and actually implies that it lacks a definition (otherwise why specify this?).
On a personal level, I would define a JSON Document as any content that could correctly use "application/json" as the content type.
This has an RFC to answer how to correctly use it: https://datatracker.ietf.org/doc/rfc8259/. Technically it has multiple RFCs. Over the decades they have been superseded a couple times. From a...
Technically it has multiple RFCs. Over the decades they have been superseded a couple times. From a skim it looks like it's been over a decade since a conforming parser has been allowed to reject non-object/array root values. This amusingly means that everyone in this thread may be using the same definition, but just from different times. Prior to 2014 the only two starting characters required to be supported were { and [.
I would think of a JSON document as being a document (i.e. file) in JSON format, in which case anything goes according to the spec, but you make a good point about document stores typically having...
I would think of a JSON document as being a document (i.e. file) in JSON format, in which case anything goes according to the spec, but you make a good point about document stores typically having more restrictions on what's allowed.
I think this might have changed at some point. From what I remember when I wrote my own JSON parser a few months ago, the latest JSON specification doesn't restrict the type of top-level values;...
I think this might have changed at some point. From what I remember when I wrote my own JSON parser a few months ago, the latest JSON specification doesn't restrict the type of top-level values; it only mandates that there be only one. But I know some older JSON parsers only allow objects and arrays to be top-level. It's possible older versions of the spec had such a restriction, but it was removed? (To be fair, there's really not much of a point to a JSON document containing only a single string or number...)
Honestly, the JSON specification has always been a little wishy-washy. The fact that JSON parsers are allowed to parse anything they want, as long as valid JSON documents still work correctly, feels a little weird to me. It means there are a lot of slightly different, incompatible versions of JSON out there.
Technically a separate spec, but is present on https://datatracker.ietf.org/doc/rfc7159/ from 2014. So a confirmation that something closely related did change.
Technically a separate spec, but
A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names.
A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.
JSON-text = ws value ws
As I was reading I was struggling to figure out why someone would be interested in counting these. Obviously "because it's there" is always a reasonable answer to such questions. But this is a...
As I was reading I was struggling to figure out why someone would be interested in counting these. Obviously "because it's there" is always a reasonable answer to such questions. But this is a hard enough problem that usually you'd want some justification for this effort.
I think it is a great example of how we think about begin counting "uncountably finite" types of things.
One would want to do exactly this type of analysis when building a hash function of sorts.
Any other ideas how one would use this information?
Personally, I like these sort of 'thought experiments' with lack of better term because of how absurd the scale of these are. A bit like the amount of possible Go games, or solvable Sudoku's, it's...
Personally, I like these sort of 'thought experiments' with lack of better term because of how absurd the scale of these are. A bit like the amount of possible Go games, or solvable Sudoku's, it's absurd. I like statistical mechanics when I was still a student of physics for that reason too, which is one of those fields that are always even more difficult than you think. Even if you think you got it nailed down there are always ways things can get even more complicated, fast.
And this then kind of shows another reason someone might want to do things like this: you learn how to approach problems like that. Even getting a close estimate can be useful if you need to tackle a problem in order to get a sense of scale should you work with computing, thermodynamics, or something else that deals with a lot of possibilities.
Absurd idea: theoretically you could use this work to build a json validator for input strings below a certain length, by generating every valid string and either storing them or hashes of them in...
Absurd idea: theoretically you could use this work to build a json validator for input strings below a certain length, by generating every valid string and either storing them or hashes of them in a lookup table. The validator would have a ridiculously low computational complexity, though the real-world speed would probably be poor due to the enormous size of the table, and it may take an unrealistic amount of memory to run. But there could be limited useful applications for this, especially if you were able to restrict the input to ascii.
That is an absurd idea, lol. I really don’t think there are any practical use cases for this, but that’s okay. It’s just a fun thing to think about. People, or at least some people with certain...
That is an absurd idea, lol. I really don’t think there are any practical use cases for this, but that’s okay. It’s just a fun thing to think about. People, or at least some people with certain mathematical inclinations, like to think about navigating impossibly large but finite possibility spaces. It’s a reminder that all the answers to all of the mysteries of the universe are just there, waiting to be found. It doesn’t have to have real world applications, it’s brain candy.
The post focuses on the number of valid strings for each length. I initially had a similar thought, but the author takes a more reasonable approach to answer specifically bounded scenarios more...
The post focuses on the number of valid strings for each length. I initially had a similar thought, but the author takes a more reasonable approach to answer specifically bounded scenarios more like "How many 100 character strings can be parsed as valid JSON?"
It took me a minute to realize the author was not talking about the values of strings as defined in the JSON spec (anything inside of
"..."
pairs), but "JSON blobs encoded as strings" also known as JSON documents.The article explores the number of valid JSON documents of several (small) sizes.
Neat!
Edit: am wrong
Acktshually, a JSON document is something else. In the article the author says:
A JSON primitive such as "null" is not a JSON document. A document must be either an array or object at the top level.
I'm not sure that's true. According to the spec, a JSON document is an
element
, which is avalue
surrounded by optional whitespace, andvalue
can be a primitive, an array, or an object.Unless you are using "JSON document" in a very specific way here, but I've not come across it before like that.
My takeaway was that the author considers it a valid string if
JSON.parse()
accepts it. Which it does with primitives like “null”.Oh no. Well I thought I’d read that somewhere.
I think some parsers only accept objects or arrays (because 99% of the time that's what you want), and some formats like Toml that are essentially based on the JSON data structure might only allow objects at the top level, but spec JSON is very flexible.
Wait, is this true? I don't see the word document anywhere in that spec. I agree that a JSON value can be a primitive, array, or object, but I've mostly seen a JSON "document" referred in the context of either a document store or a JSON schema document. In both of those cases, a document is specifically an object.
I wouldn't be surprised if there simply is no commonly accepted definition of JSON document, but I don't think it's uncommon to use it to refer to a JSON-serializable object!
You made me curious so I started poking around more. I cannot find any evidence that "JSON Document" has an official definition. The closest I could find was a JSON Schema post saying that it uses "JSON Document", "JSON data", "JSON text", and "JSON value" interchangeably, but that's obviously outside the JSON spec and actually implies that it lacks a definition (otherwise why specify this?).
On a personal level, I would define a JSON Document as any content that could correctly use "application/json" as the content type.
This has an RFC to answer how to correctly use it: https://datatracker.ietf.org/doc/rfc8259/.
Technically it has multiple RFCs. Over the decades they have been superseded a couple times. From a skim it looks like it's been over a decade since a conforming parser has been allowed to reject non-object/array root values. This amusingly means that everyone in this thread may be using the same definition, but just from different times. Prior to 2014 the only two starting characters required to be supported were { and [.
I would think of a JSON document as being a document (i.e. file) in JSON format, in which case anything goes according to the spec, but you make a good point about document stores typically having more restrictions on what's allowed.
I think this might have changed at some point. From what I remember when I wrote my own JSON parser a few months ago, the latest JSON specification doesn't restrict the type of top-level values; it only mandates that there be only one. But I know some older JSON parsers only allow objects and arrays to be top-level. It's possible older versions of the spec had such a restriction, but it was removed? (To be fair, there's really not much of a point to a JSON document containing only a single string or number...)
Honestly, the JSON specification has always been a little wishy-washy. The fact that JSON parsers are allowed to parse anything they want, as long as valid JSON documents still work correctly, feels a little weird to me. It means there are a lot of slightly different, incompatible versions of JSON out there.
Technically a separate spec, but
is present on https://datatracker.ietf.org/doc/rfc7159/ from 2014. So a confirmation that something closely related did change.
As I was reading I was struggling to figure out why someone would be interested in counting these. Obviously "because it's there" is always a reasonable answer to such questions. But this is a hard enough problem that usually you'd want some justification for this effort.
Any other ideas how one would use this information?
Personally, I like these sort of 'thought experiments' with lack of better term because of how absurd the scale of these are. A bit like the amount of possible Go games, or solvable Sudoku's, it's absurd. I like statistical mechanics when I was still a student of physics for that reason too, which is one of those fields that are always even more difficult than you think. Even if you think you got it nailed down there are always ways things can get even more complicated, fast.
And this then kind of shows another reason someone might want to do things like this: you learn how to approach problems like that. Even getting a close estimate can be useful if you need to tackle a problem in order to get a sense of scale should you work with computing, thermodynamics, or something else that deals with a lot of possibilities.
Absurd idea: theoretically you could use this work to build a json validator for input strings below a certain length, by generating every valid string and either storing them or hashes of them in a lookup table. The validator would have a ridiculously low computational complexity, though the real-world speed would probably be poor due to the enormous size of the table, and it may take an unrealistic amount of memory to run. But there could be limited useful applications for this, especially if you were able to restrict the input to ascii.
That is an absurd idea, lol. I really don’t think there are any practical use cases for this, but that’s okay. It’s just a fun thing to think about. People, or at least some people with certain mathematical inclinations, like to think about navigating impossibly large but finite possibility spaces. It’s a reminder that all the answers to all of the mysteries of the universe are just there, waiting to be found. It doesn’t have to have real world applications, it’s brain candy.
Testing how close to ideal your domain-specific compression algorithm is?
The post focuses on the number of valid strings for each length. I initially had a similar thought, but the author takes a more reasonable approach to answer specifically bounded scenarios more like "How many 100 character strings can be parsed as valid JSON?"