Your anonymized data might not be as anonymous as you thought

A new study raises serious ethical and practical questions about data security

Spread the knowledge

black and white photo of blurry human figures

 Photo by Olesya Yemets on Unsplash

What happens to the data you provide when you visit your doctor's office, fill out any kind of health-related form, or participate in a clinical trial? There are many ways that personal data are protected to ensure your privacy. These protections are governed by different levels of regulation, but sometimes become exempt from regulations governing human subject research. 

A recently published study described a statistical model capable of re-identifying individuals from de-identified (anonymized) data, even from a heavily incomplete dataset. The researchers report that “99.98% of Americans would be correctly re-identified in any  dataset using 15 demographic attributes,” with a low average  false-discovery (mis-identification) rate, proving that it is very possible for bad actors to attribute specific data to individuals.

Anonymized data sets are collected, shared and used daily at scale by health care organizations for research purposes. This paper is both shocking and controversial: Was it ethical for the researchers to share this model? Does it make it easier for hackers to get our private health information? One thing is clear: Data security, although an unexciting topic, is a critical area of technology that needs our attention.