Using Clean Rooms and Synthetic Data to Reduce Fraud

by Kevin Chen on September 27, 2022

There is an interesting duality in the world of identity. On one hand, businesses use consumer data to deliver personalized recommendations that increase the value of their services. From streaming services to grocery stores to online retailers, consumer usage data is captured and analyzed, then used to provide customized recommendations that encourage continued patronage. Most consumers can appreciate and see the value in these personalized recommendations. On the other hand, consumer data must be kept private, especially in an increasingly digitized economy where data is shared and fraudsters are on the prowl. The trend is toward data anonymization where personally identifiable information (PII) is encrypted or removed, reducing the chance of identity theft and fraud. New privacy-preserving technologies are being brought to market that enable the delivery of personalized services while keeping consumer data safely anonymized. New technology such as data clean rooms and synthetic data can be used to protect an individual’s personal data while, at the same time, delivering a personalized consumer experience.

Data Clean Rooms

Data clean rooms are online platforms that have existed for a few years. Companies use them to share data with other entities without violating user privacy. Currently, in clean rooms, anonymized consumer data is aggregated and organized into groups or cohorts in a controlled manner that is orchestrated by the clean room providers to avoid revealing the data owners’ private consumer data to one another. Companies can then perform modeling and analytics to identify characteristics of segmented consumer populations and/or deliver more effective, targeted offerings.

Examples span many industries. In advertising, brands and agencies use clean room-derived analytics to optimize and target consumer segments. Banks and financial services companies can, for example, collaborate on fraud detection and anti-money laundry efforts or enhance customers’ digital banking experience. In healthcare, clinical researchers can use patient data in clean rooms for complex studies such as patient journey, drug adherence, etc. without linking it to specific people.

The next generation of clean rooms can be transformed by deploying more advanced privacy-preserving technology. By leveraging advanced cryptographic technology such as secure multi-party computation (SMPC), data providers can securely aggregate their data with one another while maintaining total control of their respective data. SMPC ensures the private data stays on the premise of the data providers and only exchanges fragmented encrypted data throughout an entire computation. As a result, the sensitive private data is never exposed to any other parties, including the clean room providers. Because of the built-in privacy-preserving feature, SMPC can enable more sophisticated analytics with machine learning in a truly anonymous fashion and allows for the resultant models to be deployed beyond the clean room and still maintain the privacy of the data.

Two other privacy-preserving technologies include the use of 1) Differential privacy which enables sharing information about datasets by describing the patterns of groups within that dataset while withholding information about individuals in it and 2) Generative adversarial networks (GANs) that train competing deep learning models in an adversarial fashion so that the generator model produces very plausible examples that are so indiscernible from real data that they fool a separate discriminator model whose objective is to detect that the examples are fake. Working hand-in-hand with SMPC, these privacy-preserving technologies will revolutionize clean room solutions.

The use of data clean rooms will soon extend more deeply into the banking and financial services world as the methodology will be applied to the safe, anonymized sharing of credit data.

Synthetic Data Generation

An all-important key to using banking and financial services data in an anonymized fashion will be its conversion into synthetic data for the purposes of modeling and analytics. Synthetic data will look like consumer data, containing many of its attributes, but not link to any specific person.

Interestingly, the same GAN technology that is used to create deep fakes to perpetuate fraud can be used for good to generate synthetic data from sensitive data that is indistinguishable from it. Using synthetic data, companies can collaborate more effectively, build models and analyze performance, expand their business, reduce risk and prevent fraud without the concerns of re-identification of the real consumers.

Conclusion

SMPC, differential privacy, GAN and synthetic data generation are just some of the privacy-preserving technologies that enable companies to connect digital touchpoints together and access powerful insights in an anonymized way, thereby mitigating the risk of fraud. Ultimately, these technologies play a vital role in giving people the confidence to share their personal data with organizations and brands they trust.