7 votes

Input from a text file, pull from multiple APIs, formatting output, etc. in Python

I don't need answers so much as an idea of where to start.

Essentially, I have a Google Sheet that uses importjson.gs to pull from the following APIs

  • OMDB (IMDB)
  • TheMovieDB
  • TVMaze

I also use another script to scrape Letterboxd for ratings.

This works well, but sometimes it'll time out or I'll hit urlFetch limits that Google has in place.

Basically, I'd like to have a text file (input.txt) where I pop in a bunch of titles and year or IMDB IDs, then the script runs and pulls set endpoints from all of these, outputting everything on one line (a pipe as a delimiter.)

My thinking is that I can then pull that info a sheet and run all of the formatting, basic math, and whatever else so it suits my Sheet.

I have a feeling I'll be using requests for the JSON and beautifulsoup for letterboxd -- or maybe a module.

Can anyone point me in the right direction? I don't think it'll be too difficult and should work well for a first python project.

8 comments

  1. [4]
    daturkel
    Link
    An alternative to beautiful soup is Scrapy, which has great docs and tutorials on its site. What's your comfort level with Python? Are you already familiar with Pandas? It might be nice to be able...

    An alternative to beautiful soup is Scrapy, which has great docs and tutorials on its site.

    What's your comfort level with Python? Are you already familiar with Pandas? It might be nice to be able to parse your API results into a Pandas dataframe and then output with to_csv (you can override the delimiter) rather than combining the fields manually into a string.

    6 votes
    1. [3]
      tomf
      Link Parent
      Ha. The extent of my Python experience is missing a few weechat scripts, plugins for my bot, etc —essentially starting from zero. :) I do know of a bunch of the modules, but I have no idea how to...

      Ha. The extent of my Python experience is missing a few weechat scripts, plugins for my bot, etc —essentially starting from zero. :)

      I do know of a bunch of the modules, but I have no idea how to get going from scratch.

      Pandas with to_csv sounds like the winning combination, though.

      3 votes
      1. WhyCause
        Link Parent
        I would suggest using the (included) csv module instead, if all you're only going to use Pandas for to_csv. Pandas is a pretty big package to load just for CSV output. If you sort your data into a...

        I would suggest using the (included) csv module instead, if all you're only going to use Pandas for to_csv. Pandas is a pretty big package to load just for CSV output. If you sort your data into a list of dicts (or even just one dict, if you only want one row), you can use the csv.DictWriter class. If you don't care about a header row, you can just use csv.Writer.

        6 votes
      2. Adys
        Link Parent
        I do recommend checking out @daturkel's recommendation of Scrapy. If a lot of what you'll do revolves around web scraping, they're best-in-class in terms of tooling. As for how to ingest it, my...

        I do recommend checking out @daturkel's recommendation of Scrapy. If a lot of what you'll do revolves around web scraping, they're best-in-class in terms of tooling.

        As for how to ingest it, my recommendation would probably be stuff it into a sqlite db or some such and then pull it out as you like with sql. The added bonus of storing in sqlite is both portability and flexibility of what the data will look like (you can easily "just store everything"). You do need to be comfortable with basic SQL though with that approach.

        5 votes
  2. [3]
    stu2b50
    Link
    Seems fine. Instead of an input file, the more standard approach would be a CLI interface. Python comes with argparse, but there are also amazing libraries like python-fire where it infers the...

    Seems fine. Instead of an input file, the more standard approach would be a CLI interface. Python comes with argparse, but there are also amazing libraries like python-fire where it infers the interface for you.

    I would also recommend some kind of structured format for the input file if you wish to pursue that. JSON, toml, etc.

    Requests is a good library for doing, well, requests, so that sounds fine. If you need to scrape, I'd agree with your choice of BS4.

    Just outputing the results to stdout is good for a MVP, you can also consider saving it internally in a database (i.e sqlite3), or using the Google sheets API to place it in the spreadsheet automatically.

    I would also recommend using poetry to handle dependencies and venv management.

    Idk looks like you have a plan, I don't see anything particularly wrong with it.

    5 votes
    1. onyxleopard
      Link Parent
      If you are familiar with CSS selectors, then I find requests_html can be more streamlined than separately requesting a page and parsing out the data you are after with bs4.

      If you are familiar with CSS selectors, then I find requests_html can be more streamlined than separately requesting a page and parsing out the data you are after with bs4.

      2 votes
    2. tomf
      Link Parent
      ok cool. I was sort of hoping that we'd have that Matrix 'I know kung fu' stuff by now, which is why I've put this off. :) Poetry looks pretty good, too! thanks!

      ok cool. I was sort of hoping that we'd have that Matrix 'I know kung fu' stuff by now, which is why I've put this off. :)

      Poetry looks pretty good, too! thanks!

      1 vote
  3. mxuribe
    (edited )
    Link
    I'm not adding much other than to echo sentiments already noted...but i would stress trying to leverage a sqlite db file - as @Adys recommended. Storing it in a sqlite db is only a single...

    I'm not adding much other than to echo sentiments already noted...but i would stress trying to leverage a sqlite db file - as @Adys recommended. Storing it in a sqlite db is only a single file...so wont need traditional infrastructure overhead for a full-sized database. And, being a single file gives you the benefits - i presume - of your single text file approach...Plus, you can query the db data via various methods/means...Plus if you don't want to build up any codd to query the db file, you can create a simple function to export all the data from the sqlite db file out to a text file, or csv file, etc. - best of all worlds...though you would have to get familiar with methods to interact with sqlite db file.

    EDIT: I totally forgot until now that sqlite has native commands to export data in various formats too (via the command line tool)! See https://www.sqlite.org/cli.html#changing_output_formats

    2 votes