Activity

Votes

Comments

New

All activity

Showing only topics with the tag "json". Back to normal view

How many valid JSON strings are there?

~comp Article 1658 words

14 comments

qntm.org

September 26, 2025

26 votes
A reasonable configuration language
~comp
- programming languages
Article 3521 words
23 comments

ruudvanasseldonk.com

February 4, 2024

16 votes
gron - Make JSON greppable
~comp
- open source
Link
2 comments

GitHub: tomnomnom

June 12, 2022

7 votes
Input from a text file, pull from multiple APIs, formatting output, etc. in Python

~comp Ask (advice)
I don't need answers so much as an idea of where to start. Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs OMDB (IMDB) TheMovieDB TVMaze I also use...

I don't need answers so much as an idea of where to start.

Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs
- OMDB (IMDB)
- TheMovieDB
- TVMaze
I also use another script to scrape Letterboxd for ratings.

This works well, but sometimes it'll time out or I'll hit urlFetch limits that Google has in place.

Basically, I'd like to have a text file (input.txt) where I pop in a bunch of titles and year or IMDB IDs, then the script runs and pulls set endpoints from all of these, outputting everything on one line (a pipe as a delimiter.)

My thinking is that I can then pull that info a sheet and run all of the formatting, basic math, and whatever else so it suits my Sheet.

I have a feeling I'll be using requests for the JSON and beautifulsoup for letterboxd -- or maybe a module.

Can anyone point me in the right direction? I don't think it'll be too difficult and should work well for a first python project.
8 comments

tomf

February 9, 2021

7 votes
PyPy's new JSON parser

~comp Article 3200 words

0 comments

morepypy.blogspot.com

October 8, 2019

5 votes
How much faster is Redis at storing a blob of JSON compared to PostgreSQL?

~comp Article 903 words

0 comments

peterbe.com

October 2, 2019

6 votes

XML Data Munging Problem

~comp

programming

Text 568 words

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.

Your input is some XML such as this:

<DOC>
<TEXT PARTNO="000">
<TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> .
</TEXT>
<TEXT PARTNO="001">
*FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> .
</TEXT>
<TEXT PARTNO="002">
In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore .
</TEXT>
<TEXT PARTNO="003">*BAR*-1
<TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123
</TEXT>
<TEXT PARTNO="004">
Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> !
</TEXT>
</DOC>

The above XML has so-called in-line textual annotations because the XML <TAG> elements are embedded within the document text itself.

Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).

The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each <TAG>’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters [A-Z]+ between two *, and optionally a trailing dash - followed by any number of digits [0-9]+.

Here is the desired JSON output for the above example to test your solution:

{
  "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n",
  "entities": [
    {
      "id": 0,
      "mentions": [
        {
          "start": 9,
          "end": 18,
          "id": 0,
          "text": "some data"
        },
        {
          "start": 41,
          "end": 49,
          "id": 0,
          "text": "the data"
        }
      ]
    },
    {
      "id": 1,
      "mentions": [
        {
          "start": 33,
          "end": 60,
          "id": 1,
          "text": "tags in the data are nested"
        },
        {
          "start": 80,
          "end": 91,
          "id": 1,
          "text": "nested tags"
        }
      ]
    },
    {
      "id": 2,
      "mentions": [
        {
          "start": 118,
          "end": 122,
          "id": 2,
          "text": "junk"
        },
        {
          "start": 144,
          "end": 148,
          "id": 2,
          "text": "Junk"
        },
        {
          "start": 326,
          "end": 330,
          "id": 2,
          "text": "junk"
        }
      ]
    },
    {
      "id": 3,
      "mentions": [
        {
          "start": 1,
          "end": 5,
          "id": 3,
          "text": "This"
        }
      ]
    },
    {
      "id": 4,
      "mentions": [
        {
          "start": 289,
          "end": 295,
          "id": 4,
          "text": "*this*"
        }
      ]
    }
  ]
}

Python 3 solution here.

If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).

4 votes

Whatever happened to the semantic web?

~tech Link

3 comments

twobithistory.org

September 18, 2018

15 votes
JMESPath is a query language for JSON.

~comp Link

8 comments

jmespath.org

June 8, 2018

7 votes
Is there currently any way to query data from Tildes? (such as JSON, RSS) For those of us who would like to look into developing third-party apps for this site.

~tildes Ask (advice)

Basically what the title says. I might like to look into making a third party Android/iOS app.

3 comments

hankhill72

June 6, 2018

11 votes
Request: API to fetch all comments including hierarchy relationship
~tildes
- suggestions
Text 156 words
Hello, I saw in another thread being mentioned that there is no use for API for real users other than bots. So wanted to voice some real API uses that I would be interested in: When I post a new...

Hello,

I saw in another thread being mentioned that there is no use for API for real users other than bots. So wanted to voice some real API uses that I would be interested in:
1. When I post a new blog post, if I find it worthy of sharing here, it would be nice to mirror the comments I get here back on my blog post. I can imagine using API to fetch all the comments from a tildes thread, including the hierarchy relationship. The API would return a JSON with Markdown and/or HTML like the XML that Disqus exports (but JSON). When people want to comment on that post, they can come to tildes to do so, or if they don't want to create an account here, or if they don't have an invite, they can comment via other means that I have (Webmentions, Twitter, email).
2. Second use is make something like hnrss possible.
0 comments

kaushalmodi

May 22, 2018

4 votes