There is only /one/ possible way to anonymize a dataset: Randomizing the dataset D such that the resulting dataset D' is drawn from the same distribution as D. This basically, means that the table...
There is only /one/ possible way to anonymize a dataset: Randomizing the dataset D such that the resulting dataset D' is drawn from the same distribution as D. This basically, means that the table rows don't mean anything on their own but you can learn the same stuff from D' as from D.
But there is always the utility-privacy trade off. When there is little data, you cannot randomize a lot before you start to lose information. On the other hand if the dataset is large, it becomes possible to make very strong privacy guarantees without throwing information away.
Randomization is the only way for micro-data to be shared. And as always, there is no free lunch in data mining. It's possible, research has been going since the late '70s on this topic. Companies are just not doing it.
There is only /one/ possible way to anonymize a dataset: Randomizing the dataset D such that the resulting dataset D' is drawn from the same distribution as D. This basically, means that the table rows don't mean anything on their own but you can learn the same stuff from D' as from D.
But there is always the utility-privacy trade off. When there is little data, you cannot randomize a lot before you start to lose information. On the other hand if the dataset is large, it becomes possible to make very strong privacy guarantees without throwing information away.
Randomization is the only way for micro-data to be shared. And as always, there is no free lunch in data mining. It's possible, research has been going since the late '70s on this topic. Companies are just not doing it.
I believe this post is a duplicate of this one.