Is there such a thing as anonymous data?

Posted by Esolvit on Nov 2018

Although most people believe otherwise, most forms of "anonymous data" can be used to identify everything about an individual ranging from purchase histories to medical records. We release little pieces of our lives each day, either by subscribing for online services and newsletters or filling out forms for Facebook and other kinds of online surveys.

Although service vendors assure us that the data collected is anonymous, these digital breadcrumbs can be reconstituted into a cohesive whole that can be traced back to their originators.

In August 2016, the Australian government released the medical billing records, including surgeries and prescriptions of 2.9 million people. Although the released records were anonymized (with names and other identifying features/info removed), a University of Melbourne research team proved that it was easy to reidentify people and access their medical history without their consent.

The Cambridge Analytica case

When the Federal Trades Commission began investigating the unauthorized access of data belonging to over 50 million Facebook users, the issue of privacy was once again brought to the fore. The data, which was accessed by Cambridge Analytica, was provided by Aleksandr Kogan, the developer of a Facebook personality quiz app.

Approximately 270,000 people installed the personality quiz app on their Facebook account, and Aleksandr (like other Facebook developers) had access to the data on both the users' and their friends' Facebook accounts. Before installation, his app asked for permission to access users' (and their friends') data and then stored the collated data on a private database instead of deleting it. Once Aleksandr provided the database to Cambridge Analytica, the voter-profiling company used it to create 30 million "psychographic" voter profiles.

If seemingly innocuous data on social media can be easily obtained and contains enough identifiers to build a complete psychographic voter profile, such data can be used by cybercriminals to perpetrate any number of malicious objectives.

The NYC Taxi and Limo Commission dataset release

A prime example of anonymized data (that was later discovered to be not so anonymous) was the NYC Taxi and Limo Commission dataset release. The dataset contained the details of over 1.1 billion individual taxi trips in the city, including fare and tip amounts, locations, drop-off and pickup times, as well as hashed versions of the taxi's medallion numbers and license.

With some auxiliary knowledge, hackers could de-anonymize the data set and identify the weekly habits of individuals - where they went, how much they paid, where they spent most of their time, their home and work address, socializing patterns, et cetera. The information gathered could be used to track movement patterns to further the hacker's malicious objectives.

Location data

It is worrisome that 87 percent of individuals in the U.S. can be identified by their gender, five-digit zip codes and date of birth. Computational privacy researchers have also raised alarms about how the majority of individuals could be uniquely identified by their behavioral patterns based on location data from their mobile phones.

By analyzing the mobile phone database of approximate locations of 1.5 million individuals over a period of 15 months, it was possible to identify 95 percent of the individuals with only four data points of time and place. In fact, only two data points were needed to uniquely identify about 50 percent of the individuals.

The data points could be collated from publicly available information such as work address, home address and geo-tagged Twitter/ Facebook posts.

How do we remain 'anonymous'?

The implications of the above research are far-reaching. If four data points are enough to uniquely identify individuals, it means that anonymity no longer guarantees privacy, thus rendering ineffectual most of the laws and regulations concerning consumer privacy.

However, true anonymity can still be achieved in the digital age. Individuals should regularly review the privacy settings on their social media accounts and scrutinize the data they give out when filling forms and participating in online surveys.

Compliance with the GDPR (European General Data Protection Regulation) is a step in the right direction and can help boost data privacy. Data should be rendered anonymous in such a way that the data subject is no longer identifiable and data subject rights such as the right to be informed, the right to data portability, and the right to be forgotten should be enforced

Conclusion

Privacy concerns are on the rise and will only get worse in the Industry 4.0 era. With individuals spending most of their lives online and leaving digital breadcrumbs everywhere (which can ultimately be traced back to them), no one is truly safe.

Although it's convenient to pretend otherwise, researchers have shown how easy it is to re-identify people from their digital footprint. Even anonymized data can be reverse-engineered and reconstituted to uniquely identify individuals

Once data gets into cyberspace, it tends to stay there forever. One of the ways to ensure privacy is to reduce as much of our digital footprint as possible. Privacy laws should also target custodians of consumer data (such as companies, researchers and governments) and force them to shoulder more of the legal responsibility of ensuring privacy.