6 votes

What does big data look like when cross-referenced?

Google knows a lot about its users. Facebook knows a lot about its users. FitBit knows a lot about its users. And so on.

But what happens when these companies all sell their data sets to one another? It'd be pretty trivial to link even anonymized users from set to set by looking for specific features. If I went for a run, Google tracked my location, FitBit tracked my heart rate, and Facebook tracked my status about my new best mile time, for example. Thus, Google can narrow down who I am in the other sets using pre-existing information that coincides with theirs. With enough overlap they can figure out exactly who I am fairly easily. Furthermore, each additional layer of data makes this discovery process from new data sets even easier, as it gives more opportunities to confirm or rule out concurrent info. So then when, say, Credit Karma, Comcast, and Amazon's data enter the fray, my online identity stops looking like an individual egg in each different basket but a whole lot of eggs in all in one. And they can do this across millions/billions of users--not just me!

I don't know for certain that this is a thing that happens, but... I have to assume it definitely is happening, right? How could it not? With how valuable data is and how loose protections are, this seems like a logical and potentially very lucrative step.

Right now, is there an aggregate version of "me" that exists in a data store somewhere that is a more comprehensive and accurate picture than my own self-image? After all, my memory and perception are imperfect and biased, but data stores aren't.

4 comments

  1. [2]
    Deimos
    Link
    Did you happen to read this article that I submitted the other day? https://stratechery.com/2018/data-factories/ I thought it did an interesting job of thinking about Facebook and Google as "data...

    Did you happen to read this article that I submitted the other day? https://stratechery.com/2018/data-factories/

    I thought it did an interesting job of thinking about Facebook and Google as "data factories" that take in a bunch of data from different sources, and transform it into "value", with us not really getting much insight into the final result.

    4 votes
    1. kfwyre
      Link Parent
      I didn't then, but I have now. This directly responds to exactly what I was asking about. Thanks for the link. And thanks for making Tildes a privacy-friendly place. I like that I'm able to post...

      I didn't then, but I have now. This directly responds to exactly what I was asking about. Thanks for the link. And thanks for making Tildes a privacy-friendly place. I like that I'm able to post here without my words being used as fodder for a "data factory."

      3 votes
  2. [2]
    666
    Link
    I want to focus on this paragraph, it may not be happening. Or, if it is, it's not happening at the big scale you think it's happening. It's true that these companies profit off of your data, they...

    I don't know for certain that this is a thing that happens, but... I have to assume it definitely is happening, right? How could it not? With how valuable data is and how loose protections are, this seems like a logical and potentially very lucrative step.

    I want to focus on this paragraph, it may not be happening. Or, if it is, it's not happening at the big scale you think it's happening. It's true that these companies profit off of your data, they make money by analyzing your private information and showing you ads while trying to find a way to keep you entertained, connected and scrolling so they can show you even more ads. The money is in making you see the ads and interact with them. As a company that specializes in using your data to show targeted ads, why would you be interested in selling that data to your competitors? That's your gold mine, you would literally be giving them what makes you earn money, you'd be giving them an advantage over you, and a possibility of cross-referencing data and letting them show you more relevant ads and keeping you scrolling for longer time. You'd be at risk of losing visitors and ad views because the competition can do better than you. Also as Google (or any big ad network), you'd prefer if apps like FitBit used your ad service instead of having the power of choice, so you are not going to give them any bit of data you have.

    Note: this is just my opinion, and it's completely uninformed.

    2 votes
    1. kfwyre
      Link Parent
      That's a really good point. Many companies probably treat their data like a trade secret or a competitive advantage, so selling it wouldn't be worth the cost to them. That said, not all companies...

      That's a really good point. Many companies probably treat their data like a trade secret or a competitive advantage, so selling it wouldn't be worth the cost to them.

      That said, not all companies rely on ad revenue--nor do all companies have incentives to keep their data in isolation. For example, take Fitbit. They sell consumer fitness tracking devices. They're not in the business of ads (besides marketing themselves, of course), and the data they gather is probably highly desirable for a lot of industries out there (e.g. health insurance, pharmaceutical, diet & exercise, etc.). Why wouldn't Fitbit sell their data to these industries? They do not lose anything from disclosing it. If anything, it could create an even more reliable revenue stream than the sale of the devices themselves, as they could charge for subscriptions to up-to-date data.

      3 votes