JSON data format for MCQ data bank
I'm creating a data bank of MCQ (Multi Choice Questions) and their answers so that an app can be built around it. Regarding the actual storage format, I have two ideas:
- An array of objects with keys (
q
for question,a
for choice-a, etc.). - An array of arrays.
The first one is obviously more readable. Here is a brief sample of what I have so far:
{
"data": [
{
"q": "What kind of language is Python?",
"a": "Compiled",
"b": "Interpreted",
"c": "Parsed",
"d": "Elaborated",
"r": "b"
},
{
"q": "Who invented Python?",
"a": "Rasmus Lerdorf",
"b": "Guido Van Rossum",
"c": "Bill Gates",
"d": "Linus Torvalds",
"r": "b"
}
]
}
The app will read the q key to print the question, then present the four options (a, b, c and d). And the last key (r) will store the right answer. This is very much readable when viewed as a JSON file also. However, what I am thinking is that once the data-bank grows in size into hundreds/thousands of QA, a lot of space will be wasted by those keys (q,a,b,etc.) isn't it? In this case, an array like this is more efficient from storage perspective:
{
"data": [
[
"What kind of language is Python?",
"Compiled",
"Interpreted",
"Parsed",
"Elaborated",
1
],
[
"Who invented Python?",
"Rasmus Lerdorf",
"Guido Van Rossum",
"Bill Gates",
"Linus Torvalds",
1
]
]
}
In this case, each array will have 6 items viz. the question, four choices and finally the index of the correct choice (1==Interpreted, etc.).
Which of these two formats is better? Feel free to suggest any third format which is even better than these two.
I prefer to have my objects written in a literate style. My first thought was a structure more like the following. You could add other fields to each question, like a "tags" field containing a list of tags. Alternatively you could store the index of the answer in the options list, or store it as a tuple index/text pair.
This is also my preferred approach. @pyeri if you're hell bent on minimizing disk space, you could order the options to have the correct one first returning.
Or just compress it. Any reasonable algorithm will give those repeated keys a very short representation in the output file.
Added benefit is that you have something of a checksum since you a question where the answer isn't in the list of options can be considered invalid.
Premature optimization is the root of all evil.
Given modern disk capacities, the storage space required for a few dozen extra characters of text per record is negligible.
The more important question is which format is easier to understand and easier to extend when requirements change? That's the one you should use.
Thanks. Regarding the extensibility, I'm still unable to decide whether or not to include a "topic" container key or not. For example, "Core Python" can contain the basic Python Q/A, "Web Development" will have questions on frameworks like flask, etc. A simpler way is to use the file-system (directory and file names) as topic and include separate JSON files for storing topic meta-data. What do you suggest?
instead of all keys or all arrays, you should have 2 keys: a question key, which is a string, and an awnser key, which is an array of strings.