Risk factors of biometric identification using DNA
Andy Green, a technical content specialist at Varonis, specialists in data governance software, examines whether DNA should be classed as personally identifiable information.
Biometric data is at the limits of what current personal data privacy laws consider worthy of protection. This type of identifier covers fingerprints, voiceprints and facial images. While the risk factors are not nearly as threatening to consumers as more traditional personally identifiable information (PII), they do exist. Until recently, the dangers of biometric identification using DNA were more theoretical than real. That has suddenly changed. A recent article in The New York Times put the spotlight on research that proved the feasibility of identifying a person getting a specific name and address all from a DNA sequence posted online.
Its not that regulators have overlooked biometric identifiers. Under the Health Insurance Portability and Accountability Acts (HIPAA) safe harbor rules in the US, for example, the US Department of Health and Human Services (DHHS) has a list of 18 e-PHIs (electronic protected health information) that would need to be removed from public medical data for it to be effectively considered de-identified. Along with IP addresses, URLs and email addresses, the DHHS mentions biometric data, with voiceprints and fingerprints given as the only examples.
Varonis has already written about how the Federal Trade Commission (FTC), another key US agency involved in data privacy regulation, has issued new guidelines to companies collecting facial images. Driving the FTCs suggestions mostly directed at retailers are the recent improvements in image recognition technology and the availability of massive amounts of tagged photos on social media sites. Image-matching software is now good enough so that a face captured by a stores retail kiosk camera can eventually reveal ethnicity, mood and, with good likelihood, an actual name behind the face.
The risk of linking a name to a set of fingerprints is less serious for the general public unless you have a criminal record. However, after the Graduate Management Admission Council (GMAC) began using fingerprints to establish the identity of students taking their graduate management admission tests (GMATs) for admission to business schools, the testing company realised there could be privacy issues.
GMAC ultimately decided to use palm scans, which are based on digitising vein patterns. Since public databases of hand veins dont exist, the possibility of identification is eliminated.
I would have put DNA into the same category as palm scans: theres advanced matching technology available even at the consumer level but without a public database, there isnt much of a privacy issue and therefore DNA is not really PII.
However, this is not true any more and that was the starting point for the researchers mentioned in The New York Times* article.
There are actually two public genealogy databases for tracking down ones ancestry, Ysearch and SMGF, with a combined 135,000 records of DNA data, covering about 39,000 unique last names.
These genealogy databases simply accept a key actually a pattern on the Y-chromosome and then return a surname (along with a confidence level). The idea behind these services is to help subscribers find their ancestors and learn more about family backgrounds.
The researchers then examined whether they could narrow down their search. They assumed that they had the state of residency of the subject along with a birth date both of these, by the way, are not considered PII under current HIPAA rules. With these three data points and public US Census data, they were able to prove that successful DNA matches would lead to just 12 people on average. Thats a stunning end result from starting with just a DNA pattern.
How good is the DNA keyword match at finding a last name? The researchers projected a success rate of 12 per cent for males since its based on the Y chromosome with a five per cent false positive. This is not nearly as accurate as the facial scans, but still a cause for concern. They concluded that the risk of this DNA-based l