Home » Security Bloggers Network » Part 5: Machine Learning Methods to Process Datasets With QI Values

Part 5: Machine Learning Methods to Process Datasets With QI Values

by Halyna Oliinyk on July 31, 2019

Differential
Privacy (DP): This mathematical framework gives the ability to control to what
extent the model ‘remembers’ and ‘forgets’ potentially sensitive data, which is
its big advantage. The most popular concept of DP is ‘noisy counting’, which is
based on drawing samples from Laplace distribution and using them to make the
dataset represent augmented values, not the real one. However, the main disadvantage
of Differential Privacy is the potential for the attacker to estimate the actual
value from the repeated queries. Predictions made by using different private
datasets are accurate enough, but with each new query made by the attacker,
more and more sensitive information is getting released.
Federated
Learning. The core idea of federated learning is very similar to distributed learning,
because we’re not trying to train our model with all of our data at once, but instead
are training it on subsets of it. This is quite a powerful method as long as we
can effectively train and improve the model on separate devices while holding
different subsets of data and gradually improve it.
‘Private
Aggregation of Teacher Ensembles’ (PATE): This framework uses part of the different
privacy methods, which is storing personal/sensitive data in a way that doesn’t
reveal any kind of individual personal information. The core idea of PATE is
that if two models trained on separate data agree on some outcome, it is less
likely that sharing the outcome to the consumer will leak any sensitive data
about a specific user. Training methodology is quite similar to federated
learning (and bagging techniques, of course) because at the first step we need
to split our dataset into smaller subsets and then train different models on
them. Predictions are made by aggregating all of the predictions from different
models and injecting noise into them.

Another important feature of PATE is that we’re continuously training our downstream ‘student’ model using this ‘noisy’ data and at finally showing the user not the ‘teacher’ models, but rather the ‘student’ ones, which ensures that sensitive/personal data is not revealed during inference phase.

We would love to hear your thoughts on this series. Please feel free to respond here with comments/questions/other feedback.

This is the fifth part of a five-part series about machine learning methodologies for de-identifying and securing personal data by 1touch.io. For part one, click here. For part two, click here. For part three, click here. For part four, click here.

The post Part 5: Machine Learning Methods to Process Datasets With QI Values appeared first on 1touch.io.

Part 5: Machine Learning Methods to Process Datasets With QI Values

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Randall Munroe’s XKCD ‘Bottle’