Part 3: Machine Learning Ways to De-Identify Personal Data (Homomorphic Encryption)

-Homomorphic Encryption: The main idea behind homomorphic encryption is that the inferences we make based on computations of encrypted data should be as accurate as if we had used decrypted data. Homomorphic encryption is an evolving field, and at this point in time as certain limitations. For example, only polynomial functions can be computed and only additions and multiplications of integers modulo-n are allowed. Most mathematical operations, which are used even in the simplest neural networks are not allowed when performing model training with homomorphically encrypted data. As you can understand, the final concepts of this methodology are still being developed.

The
main idea behind homomorphic encryption is that we don’t need to remove any
kind of values from the dataset, or mask/anonymize personal data in any way.
However, at this point in time, there is not enough practical evidence to state
that they can be used for the production-level methodologies; furthermore,
there are not so many functional homomorphic encryption pipelines.

Let’s
imagine a situation when we’ve removed all personal data from the dataset (or
anonymized and stored it separately from other values). Most likely, even after
removal of the personal data, QIs are still left in the database.

The biggest problem of storing quasi-identifiers is that when enduring an attack on the database, it isn’t all that difficult to combine QI values with some other open data sources and reveal the identity of the person together with their personal/sensitive information. A good example of that is when the Netflix Prize competition open data was combined with IMDB’s movie ratings dataset: the entire movie-watching history of individuals was compromised.

As a
result of datasets, insecure data science pipelines, which make predictions
using datasets and QIs, can also reveal potentially sensitive/personal
information even after the personal/sensitive data itself has been removed. We
need to make sure that no queries that have the potential to reveal individual
personal info can be leveraged. Furthermore, we must make sure that no
inference on the data subject can be made by running multiple predictions using
machine learning algorithms.

This is the second post in our Deidentifying and Securing Personal Data Series. To read part one, click here. For part two, click here. For part four, click here.

The post Part 3: Machine Learning Ways to De-Identify Personal Data (Homomorphic Encryption) appeared first on 1touch.io.


*** This is a Security Bloggers Network syndicated blog from 1touch.io authored by Halyna Oliinyk. Read the original post at: https://1touch.io/part-3-machine-deidentify-personal-data/