Recently changed my credit card provider to MasterCard, then I started getting some quite pertinent and well-timed spam emails related to my use of the credit card. I discovered MasterCard is...
Recently changed my credit card provider to MasterCard, then I started getting some quite pertinent and well-timed spam emails related to my use of the credit card.
I discovered MasterCard is selling it's customer transaction data online - in an anonymised form. I'm not a data scientist, but from some brief research I get the impression it's quite easy to de-anonymise a large dataset. See for example this video: https://www.youtube.com/watch?v=puQvpyf0W-M
From a privacy perspective, is there anything to the implied privacy of 'anonymised data' or is it mostly a fig leaf to allow businesses to sell my data and still comply with, for example European GDPR privacy requirements? Would love to hear the thoughts of those well-versed in law and/or tech. Thank you.
A major issue in this space is that different entities use the term 'anonymized' for very different privacy bars. A common definition of anonymized is just 'I removed the unique identifiers of...
A major issue in this space is that different entities use the term 'anonymized' for very different privacy bars. A common definition of anonymized is just 'I removed the unique identifiers of name/email address/ip address etc'. This is a very weak approach and there's been many examples over the years where the other fields in the dataset (even if some aggregation and randomness is applied) were sufficient to link back to individual users. It's only been somewhat recently that better approaches to anonymization have appeared which better match the colloquial idea of anonymization. The main approach being something called differential privacy, which is a set of aggregation techniques that provably bounds the probability that adding or removing any single person from the dataset would be detectable. This is the technique that for example the US Census adopted in the most recent census. Applied properly it provides very strong protections against re-identification, but it can also add a lot of noise and removes outlier rows. It's what I'd consider the gold standard for anonymization and anything else is basically just marketing or security through obscurity.
Recently changed my credit card provider to MasterCard, then I started getting some quite pertinent and well-timed spam emails related to my use of the credit card.
I discovered MasterCard is selling it's customer transaction data online - in an anonymised form. I'm not a data scientist, but from some brief research I get the impression it's quite easy to de-anonymise a large dataset. See for example this video:
https://www.youtube.com/watch?v=puQvpyf0W-M
From a privacy perspective, is there anything to the implied privacy of 'anonymised data' or is it mostly a fig leaf to allow businesses to sell my data and still comply with, for example European GDPR privacy requirements? Would love to hear the thoughts of those well-versed in law and/or tech. Thank you.
A major issue in this space is that different entities use the term 'anonymized' for very different privacy bars. A common definition of anonymized is just 'I removed the unique identifiers of name/email address/ip address etc'. This is a very weak approach and there's been many examples over the years where the other fields in the dataset (even if some aggregation and randomness is applied) were sufficient to link back to individual users. It's only been somewhat recently that better approaches to anonymization have appeared which better match the colloquial idea of anonymization. The main approach being something called differential privacy, which is a set of aggregation techniques that provably bounds the probability that adding or removing any single person from the dataset would be detectable. This is the technique that for example the US Census adopted in the most recent census. Applied properly it provides very strong protections against re-identification, but it can also add a lot of noise and removes outlier rows. It's what I'd consider the gold standard for anonymization and anything else is basically just marketing or security through obscurity.
Your comment is very useful. I'll keep this in mind when researching the topic further.