Activity

Votes

Comments

New

All activity

Showing only topics in ~comp with the tag "json". Back to normal view / Search all groups

How many valid JSON strings are there?

Article 1658 words

14 comments

qntm.org

September 26, 2025

26 votes
A reasonable configuration language
- programming languages
Article 3521 words
23 comments

ruudvanasseldonk.com

February 4, 2024

16 votes
gron - Make JSON greppable
- open source
Link
2 comments

GitHub: tomnomnom

June 12, 2022

7 votes
Input from a text file, pull from multiple APIs, formatting output, etc. in Python

Ask (advice)
I don't need answers so much as an idea of where to start. Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs OMDB (IMDB) TheMovieDB TVMaze I also use...

I don't need answers so much as an idea of where to start.

Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs
- OMDB (IMDB)
- TheMovieDB
- TVMaze
I also use another script to scrape Letterboxd for ratings.

This works well, but sometimes it'll time out or I'll hit urlFetch limits that Google has in place.

Basically, I'd like to have a text file (input.txt) where I pop in a bunch of titles and year or IMDB IDs, then the script runs and pulls set endpoints from all of these, outputting everything on one line (a pipe as a delimiter.)

My thinking is that I can then pull that info a sheet and run all of the formatting, basic math, and whatever else so it suits my Sheet.

I have a feeling I'll be using requests for the JSON and beautifulsoup for letterboxd -- or maybe a module.

Can anyone point me in the right direction? I don't think it'll be too difficult and should work well for a first python project.
8 comments

tomf

February 9, 2021

7 votes
PyPy's new JSON parser

Article 3200 words

0 comments

morepypy.blogspot.com

October 8, 2019

5 votes
How much faster is Redis at storing a blob of JSON compared to PostgreSQL?

Article 903 words

0 comments

peterbe.com

October 2, 2019

6 votes

XML Data Munging Problem

programming

Text 568 words

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.

Your input is some XML such as this:

<DOC>
<TEXT PARTNO="000">
<TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> .
</TEXT>
<TEXT PARTNO="001">
*FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> .
</TEXT>
<TEXT PARTNO="002">
In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore .
</TEXT>
<TEXT PARTNO="003">*BAR*-1
<TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123
</TEXT>
<TEXT PARTNO="004">
Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> !
</TEXT>
</DOC>

The above XML has so-called in-line textual annotations because the XML <TAG> elements are embedded within the document text itself.

Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).

The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each <TAG>’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters [A-Z]+ between two *, and optionally a trailing dash - followed by any number of digits [0-9]+.

Here is the desired JSON output for the above example to test your solution:

{
  "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n",
  "entities": [
    {
      "id": 0,
      "mentions": [
        {
          "start": 9,
          "end": 18,
          "id": 0,
          "text": "some data"
        },
        {
          "start": 41,
          "end": 49,
          "id": 0,
          "text": "the data"
        }
      ]
    },
    {
      "id": 1,
      "mentions": [
        {
          "start": 33,
          "end": 60,
          "id": 1,
          "text": "tags in the data are nested"
        },
        {
          "start": 80,
          "end": 91,
          "id": 1,
          "text": "nested tags"
        }
      ]
    },
    {
      "id": 2,
      "mentions": [
        {
          "start": 118,
          "end": 122,
          "id": 2,
          "text": "junk"
        },
        {
          "start": 144,
          "end": 148,
          "id": 2,
          "text": "Junk"
        },
        {
          "start": 326,
          "end": 330,
          "id": 2,
          "text": "junk"
        }
      ]
    },
    {
      "id": 3,
      "mentions": [
        {
          "start": 1,
          "end": 5,
          "id": 3,
          "text": "This"
        }
      ]
    },
    {
      "id": 4,
      "mentions": [
        {
          "start": 289,
          "end": 295,
          "id": 4,
          "text": "*this*"
        }
      ]
    }
  ]
}

Python 3 solution here.

If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).

4 votes

JMESPath is a query language for JSON.

Link

8 comments

jmespath.org

June 8, 2018

7 votes