9 votes

Debunking an election fraud claim using open data and Dolt

5 comments

  1. [5]
    arghdos
    (edited )
    Link
    This is an interesting article... at first I was thinking, why would I want Pandas without the nice built-in text parsing features that comes with it (and Python). But the idea of a DB as a first...

    This is an interesting article... at first I was thinking, why would I want Pandas without the nice built-in text parsing features that comes with it (and Python). But the idea of a DB as a first class Git citizen is interesting, though again you could get all of the functionality they’re demonstrating here by adding a .csv to your git repo. Will be interesting to see how this goes in the future, and/or the interop with plain git (e.g., I’d be way more likely to stick a DoIt DB in my git repo rather than have a dedicated DoIt repo solely for the DB)

    2 votes
    1. [4]
      skybrian
      Link Parent
      I haven’t used Dolt (I just read the blog) but I think the commit history would be cleaner with the data separate from the code? Also, applying patches should be easier with a tool designed to...

      I haven’t used Dolt (I just read the blog) but I think the commit history would be cleaner with the data separate from the code? Also, applying patches should be easier with a tool designed to deal with databases.

      2 votes
      1. [3]
        arghdos
        Link Parent
        Possibly, though I'm not really opposed to commits like: which would work fine for the CSV's they use as example here (probably not so much for whatever their internal DoIt format is). Really the...

        I think the commit history would be cleaner with the data separate from the code?

        Possibly, though I'm not really opposed to commits like:

        "Updated DB from source to DATE"

        which would work fine for the CSV's they use as example here (probably not so much for whatever their internal DoIt format is). Really the problem I have is that managing multiple repos quickly becomes a PITA. Git sub-modules are... not great, sharing credentials through Jenkins/Docker or your CI of choice is less than ideal, etc. That may just come down to personal taste though :)

        1 vote
        1. [2]
          skybrian
          Link Parent
          I meant the other way, where the people maintaining the source data probably don’t want code commits mixed in with their data, and they probably only want pull requests for data fixes. I think it...

          I meant the other way, where the people maintaining the source data probably don’t want code commits mixed in with their data, and they probably only want pull requests for data fixes.

          I think it makes sense to take a CSV file from a dolt repo and commit it to a git repo if you just want a read-only copy.

          2 votes
          1. arghdos
            Link Parent
            Ahhh yeah, that makes sense -- you're right from a data-distribution point of view, that's definitely what you want (and what the blog implies is one of the major purposes of DoIt).

            Ahhh yeah, that makes sense -- you're right from a data-distribution point of view, that's definitely what you want (and what the blog implies is one of the major purposes of DoIt).

            2 votes