13 votes

Practical SQL for data analysis

6 comments

  1. [6]
    HotPants
    Link
    Unrelated.... I'm a bit of a data nerd. Give me a bunch of data in a database and I can't help but play around with it. For instance, there were 181 thousand people in the Facebook hack claiming...

    Unrelated.... I'm a bit of a data nerd.

    Give me a bunch of data in a database and I can't help but play around with it.

    For instance, there were 181 thousand people in the Facebook hack claiming to work at Facebook, just in the USA alone. There are only 52 thousand Facebook employees worldwide.

    But typically the hardest part of analyzing data isn't the tool. It's dealing with the data and dealing with the people trying to understand the data.

    First you have to avoid outliers or exceptions (most people weren't in the facebook hack, and most people in the facebook hack did not have a company specified, so for most companies the number of hacked employees was 1%-10% of their total employee count).

    Then you have to gather insights (maybe facebook employees are more diligent than most about entering their company)

    Lastly, most people aren't data driven, in my experience you are better off with anecdotes that bring data to life (a lot of teens think it's funny to put Krusty Krab as their employer, which was the fourth most popular choice, so maybe they also thought it would be equally funny to list Facebook as their employer, which was the second most popular choice, right behind Self Employed.)

    8 votes
    1. [5]
      vord
      Link Parent
      Here you are, fellow data nerd: https://www.ssa.gov/oact/babynames/limits.html Ever wonder where all the baby naming sites/apps get their data? Right there. Being dissatisfied with typical baby...

      Here you are, fellow data nerd:

      https://www.ssa.gov/oact/babynames/limits.html

      Ever wonder where all the baby naming sites/apps get their data? Right there.

      Being dissatisfied with typical baby name apps for first child, used bash tools (mostly awk) to aggregate name popularity by decade, then found names in desired popularity band (not too high, not too low).

      Doing something similiar to this, via the terrible app methods, we think with the naming of our first child we helped kick off a boom of its usage. Was too unpoular for many people when we picked, but rising popularity created a feedback loop. You'd be surprised how few babies it takes to boost a name's popularity.

      6 votes
      1. [4]
        HotPants
        Link Parent
        TIL about AWK. Sadly I can't think of anything novel to do with name data that hasn't already been done before.

        TIL about AWK.

        Sadly I can't think of anything novel to do with name data that hasn't already been done before.

        1 vote
        1. [3]
          vord
          Link Parent
          It's not about novel, it's about doing it yourself. :) For example, some things to try, possibly to build into a website. I haven't found anything quite like this, and not the time to figure out...

          It's not about novel, it's about doing it yourself. :)

          For example, some things to try, possibly to build into a website. I haven't found anything quite like this, and not the time to figure out for myself.

          • Run through a fuzzy search to try to combine spelling varients.
          • Scrape for anouncation data, allow to filter out by phonics and sylables. (no 'sh' sound in middle)
          • Try to find patterns in name popularity that correlate with other metrics like religion popularity, wars or economic crisis
          • Migration patterns via name popularity by state year to year.
          • What led up to there being like 4 Jessicas in my school classroms in the 90's?
          4 votes
          1. [2]
            HotPants
            Link Parent
            Currently I have a database of stock fundamentals and option prices that I am still working through.

            Currently I have a database of stock fundamentals and option prices that I am still working through.

            1 vote