2 votes

JSON data format for MCQ data bank

I'm creating a data bank of MCQ (Multi Choice Questions) and their answers so that an app can be built around it. Regarding the actual storage format, I have two ideas:

  1. An array of objects with keys (q for question, a for choice-a, etc.).
  2. An array of arrays.

The first one is obviously more readable. Here is a brief sample of what I have so far:

{
    "data": [
        {
            "q": "What kind of language is Python?",
            "a": "Compiled",
            "b": "Interpreted",
            "c": "Parsed",
            "d": "Elaborated",
            "r": "b"
        },
        {
            "q": "Who invented Python?",
            "a": "Rasmus Lerdorf",
            "b": "Guido Van Rossum",
            "c": "Bill Gates",
            "d": "Linus Torvalds",
            "r": "b"
        }
	]
}

The app will read the q key to print the question, then present the four options (a, b, c and d). And the last key (r) will store the right answer. This is very much readable when viewed as a JSON file also. However, what I am thinking is that once the data-bank grows in size into hundreds/thousands of QA, a lot of space will be wasted by those keys (q,a,b,etc.) isn't it? In this case, an array like this is more efficient from storage perspective:

{
    "data": [
        [
            "What kind of language is Python?",
            "Compiled",
            "Interpreted",
            "Parsed",
            "Elaborated",
            1
        ],
        [
            "Who invented Python?",
            "Rasmus Lerdorf",
            "Guido Van Rossum",
            "Bill Gates",
            "Linus Torvalds",
            1
        ]
	]
}

In this case, each array will have 6 items viz. the question, four choices and finally the index of the correct choice (1==Interpreted, etc.).

Which of these two formats is better? Feel free to suggest any third format which is even better than these two.

7 comments

  1. [4]
    Notcoffeetable
    Link
    I prefer to have my objects written in a literate style. My first thought was a structure more like the following. You could add other fields to each question, like a "tags" field containing a...

    I prefer to have my objects written in a literate style. My first thought was a structure more like the following. You could add other fields to each question, like a "tags" field containing a list of tags. Alternatively you could store the index of the answer in the options list, or store it as a tuple index/text pair.

    {
        "data" : [
            {
                "prompt": "What kind of language is Python?",
                "options": [
                    "Compiled",
                    "Interpreted",
                    "Parsed",
                    "Elaborated"
                ],
                "answer": "Intepreted"
            },
            {
                "prompt": "Who invented Python?",
                "options": [
                    "Rasmus Lerdorf",
                    "Guido Van Rossum",
                    "Bill Gates",
                    "Linus Torvalds"
                ],
                "answer": "Guido Van Rossum"
            }
        ]
    }
    
    14 votes
    1. [2]
      ra314
      Link Parent
      This is also my preferred approach. @pyeri if you're hell bent on minimizing disk space, you could order the options to have the correct one first returning.

      This is also my preferred approach. @pyeri if you're hell bent on minimizing disk space, you could order the options to have the correct one first returning.

      1 vote
      1. Macha
        Link Parent
        Or just compress it. Any reasonable algorithm will give those repeated keys a very short representation in the output file.

        Or just compress it. Any reasonable algorithm will give those repeated keys a very short representation in the output file.

        3 votes
    2. scherlock
      Link Parent
      Added benefit is that you have something of a checksum since you a question where the answer isn't in the list of options can be considered invalid.

      Added benefit is that you have something of a checksum since you a question where the answer isn't in the list of options can be considered invalid.

  2. [2]
    GogglesPisano
    Link
    Premature optimization is the root of all evil. Given modern disk capacities, the storage space required for a few dozen extra characters of text per record is negligible. The more important...

    This is very much readable when viewed as a JSON file also. However, what I am thinking is that once the data-bank grows in size into hundreds/thousands of QA, a lot of space will be wasted by those keys (q,a,b,etc.) isn't it? In this case, an array like this is more efficient from storage perspective:

    Premature optimization is the root of all evil.

    Given modern disk capacities, the storage space required for a few dozen extra characters of text per record is negligible.

    The more important question is which format is easier to understand and easier to extend when requirements change? That's the one you should use.

    10 votes
    1. pyeri
      Link Parent
      Thanks. Regarding the extensibility, I'm still unable to decide whether or not to include a "topic" container key or not. For example, "Core Python" can contain the basic Python Q/A, "Web...

      Thanks. Regarding the extensibility, I'm still unable to decide whether or not to include a "topic" container key or not. For example, "Core Python" can contain the basic Python Q/A, "Web Development" will have questions on frameworks like flask, etc. A simpler way is to use the file-system (directory and file names) as topic and include separate JSON files for storing topic meta-data. What do you suggest?

  3. Toric
    Link
    instead of all keys or all arrays, you should have 2 keys: a question key, which is a string, and an awnser key, which is an array of strings.

    instead of all keys or all arrays, you should have 2 keys: a question key, which is a string, and an awnser key, which is an array of strings.

    1 vote