How AI can recognize people even in anonymized datasets

 Weekly social interactions create distinct signatures that set people apart.

How AI can recognize people even in anonymized datasets
How AI can recognize people even in anonymized datasets

The way you interact with others in a crowd may help you stand out from the crowd, at least to artificial intelligence.

Researchers report in Nature Communications on January 25 that when given information about a target individual's mobile phone interactions as well as the interactions of their contacts, AI can correctly identify the target out of more than 40,000 anonymous mobile phone service subscribers more than half of whom the time. The findings suggest that humans socialize in ways that could be used to identify them in supposedly anonymized datasets.

According to Jaideep Srivastava, a computer scientist at the University of Minnesota in Minneapolis who was not involved in the study, "it's no surprise that people tend to stay within established social circles and that these regular interactions form a stable pattern over time." "However, it's surprising that you can use that pattern to identify the individual."

Companies that collect information about people's daily interactions can share or sell this data without users' consent, according to the European Union's General Data Protection Regulation and the California Consumer Privacy Act. The data must, however, be anonymized. According to Yves-Alexandre de Montjoye, a computational privacy researcher at Imperial College London, some organizations may believe they can meet this standard by providing users with pseudonyms. "Our findings indicate that this is not the case."

de Montjoye and his colleagues hypothesized that people's social behavior could be used to identify them in datasets of anonymous users' interactions. To put their hypothesis to the test, the researchers trained an artificial neural network (an AI that mimics the neural circuitry of a biological brain) to recognize patterns in users' weekly social interactions.

For one test, the researchers fed the neural network data from an unidentified mobile phone service, which detailed 43,606 subscribers' interactions over the course of 14 weeks. This information included the date, time, duration, type of interaction (call or text), pseudonyms of the parties involved, and who initiated the communication.

The interaction data for each user was organized into web-shaped data structures comprised of nodes representing the user and their contacts. The nodes were linked together by strings threaded with interaction data. The AI was shown a known person's interaction web and then set loose to search the anonymized data for the web that bore the most resemblance.

When shown interaction webs containing information about a target's phone interactions that occurred one week after the latest records in the anonymous dataset, the neural network only linked 14.7 percent of individuals to their anonymized selves. However, when given information about the target's interactions as well as those of their contacts, it identified 52.4 percent of people. 

When the researchers fed the AI interaction data from the target and contacts collected 20 weeks after the anonymous dataset, the AI correctly identified users 24.3 percent of the time, indicating that social behavior can be identified over long periods of time.

The researchers tested the AI on a dataset consisting of four weeks of close-proximity data from the mobile phones of 587 anonymous university students collected by researchers in Copenhagen to see if it could profile social behavior elsewhere. 

Interaction data included students' pseudonyms, encounter times, and the strength of the received signal, which indicated proximity to other students. COVID-19 contact tracing applications frequently collect these metrics. The AI correctly identified students in the dataset 26.4 percent of the time when given a target and their contacts' interaction data.

The findings, the researchers note, are unlikely to apply to Google's and Apple's contact tracing protocols, which protect users' privacy by encrypting all Bluetooth metadata and prohibiting the collection of location data.

De Montjoye hopes that the research will assist policymakers in improving strategies to protect users' identities. According to him, data protection laws permit the sharing of anonymized data to support user research. "However, it is critical for this to work that anonymization actually protects individuals' privacy."

Source :

A.-M. Creţu et al. Interaction data are identifiable even across long periods of time. Nature Communications. Published online January 25, 2022. DOI: 10.1038/s41467-021-27714-6.


Font Size
lines height