-
16 votes
-
gron - Make JSON greppable
7 votes -
Input from a text file, pull from multiple APIs, formatting output, etc. in Python
I don't need answers so much as an idea of where to start. Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs OMDB (IMDB) TheMovieDB TVMaze I also use...
I don't need answers so much as an idea of where to start.
Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs
- OMDB (IMDB)
- TheMovieDB
- TVMaze
I also use another script to scrape Letterboxd for ratings.
This works well, but sometimes it'll time out or I'll hit urlFetch limits that Google has in place.
Basically, I'd like to have a text file (input.txt) where I pop in a bunch of titles and year or IMDB IDs, then the script runs and pulls set endpoints from all of these, outputting everything on one line (a pipe as a delimiter.)
My thinking is that I can then pull that info a sheet and run all of the formatting, basic math, and whatever else so it suits my Sheet.
I have a feeling I'll be using
requests
for the JSON andbeautifulsoup
for letterboxd -- or maybe a module.Can anyone point me in the right direction? I don't think it'll be too difficult and should work well for a first python project.
7 votes -
PyPy's new JSON parser
5 votes -
How much faster is Redis at storing a blob of JSON compared to PostgreSQL?
6 votes -
XML Data Munging Problem
Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...
Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.
Your input is some XML such as this:
<DOC> <TEXT PARTNO="000"> <TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> . </TEXT> <TEXT PARTNO="001"> *FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> . </TEXT> <TEXT PARTNO="002"> In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore . </TEXT> <TEXT PARTNO="003">*BAR*-1 <TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123 </TEXT> <TEXT PARTNO="004"> Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> ! </TEXT> </DOC>
The above XML has so-called in-line textual annotations because the XML
<TAG>
elements are embedded within the document text itself.Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).
The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each
<TAG>
’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters[A-Z]+
between two*
, and optionally a trailing dash-
followed by any number of digits[0-9]+
.Here is the desired JSON output for the above example to test your solution:
{ "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n", "entities": [ { "id": 0, "mentions": [ { "start": 9, "end": 18, "id": 0, "text": "some data" }, { "start": 41, "end": 49, "id": 0, "text": "the data" } ] }, { "id": 1, "mentions": [ { "start": 33, "end": 60, "id": 1, "text": "tags in the data are nested" }, { "start": 80, "end": 91, "id": 1, "text": "nested tags" } ] }, { "id": 2, "mentions": [ { "start": 118, "end": 122, "id": 2, "text": "junk" }, { "start": 144, "end": 148, "id": 2, "text": "Junk" }, { "start": 326, "end": 330, "id": 2, "text": "junk" } ] }, { "id": 3, "mentions": [ { "start": 1, "end": 5, "id": 3, "text": "This" } ] }, { "id": 4, "mentions": [ { "start": 289, "end": 295, "id": 4, "text": "*this*" } ] } ] }
Python 3 solution here.
If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).
4 votes -
Whatever happened to the semantic web?
15 votes -
JMESPath is a query language for JSON.
7 votes -
Is there currently any way to query data from Tildes? (such as JSON, RSS) For those of us who would like to look into developing third-party apps for this site.
Basically what the title says. I might like to look into making a third party Android/iOS app.
11 votes -
Request: API to fetch all comments including hierarchy relationship
Hello, I saw in another thread being mentioned that there is no use for API for real users other than bots. So wanted to voice some real API uses that I would be interested in: When I post a new...
Hello,
I saw in another thread being mentioned that there is no use for API for real users other than bots. So wanted to voice some real API uses that I would be interested in:
- When I post a new blog post, if I find it worthy of sharing here, it would be nice to mirror the comments I get here back on my blog post. I can imagine using API to fetch all the comments from a tildes thread, including the hierarchy relationship. The API would return a JSON with Markdown and/or HTML like the XML that Disqus exports (but JSON). When people want to comment on that post, they can come to tildes to do so, or if they don't want to create an account here, or if they don't have an invite, they can comment via other means that I have (Webmentions, Twitter, email).
- Second use is make something like hnrss possible.
4 votes