Home » Security Bloggers Network » Prevent Data Breaches: Identity Logs and Machine Learning

Prevent Data Breaches: Identity Logs and Machine Learning

by Nach Mishran on July 15, 2019

An identity platform like ForgeRock sits right in the heart of an enterprise, with a view of all apps, identities, devices, and resources attempting to connect with each other. It turns out that this is a perfect position to gather rich log identity data to use to prevent data breaches.

Prevent Data Breaches? It’s Hard.

An attacker has the luxury of finding the easiest way to break-in, whereas a defense team has to secure every possible attack surface. There were 12,440 new breaches in 2018, which was an increase of 424% over the known breach count in 2017. A total of 14.9 billion identity records were found to have been exposed during the year, up from 8.7 billion available in 2017. Some of the hardest breaches to find are micro data breaches, which are spread over a long period of time. Data breaches through micro transactions are becoming more prevalent and are very hard to detect.

Identity Logs and Machine Learning: How To Approach the Problem

We are in the right position: All authentication (AuthN) and authorization (AuthZ) requests and identities behavior events are tracked and logged by our IAM products.
We stream raws logs into a big data store and store a few months of data.
We analyze behavioral patterns on logs generated by identities. When we represent these patterns in a latent space, we can use the pattern to train models to detect anomaly behaviors.

Machine Learning Algorithms Showing Promise

Log Embedding

We leveraged word embedding to learn temporal contextual information. This helped us to learn what events naturally occur with identities and group them into a latent space. After further experimentation using a customized version of Non Contrastive Loss, we converged to a 50 dimensional temporal representation of an identity behavior in the latent space.

Autoencoders

We use stacked autoencoder to compress the log embeddings with artificial bayesian noise in the input. The bottleneck layer compressed higher dimension log embeddings into principal lower dimensional representation. The decoder learned to reconstruct from the lower dimensional representation. We used simple reverse indexing methods to map and extract information from the log entries.

Initial Results

We have over 90% accuracy in predicting anomaly which is used through a graphQL API to predict micro-data breaches. Our t-SNE visualization corroborates these results.

In Part 2 of this blog series on how to prevent data breaches, which will appear next month, we will delve into metrics, derived metrics, A/B testing, back-testing, and how we improved on this model.

To learn more about ForgeRock Identity Platform, visit us here. If you prefer to speak to someone directly, contact us today.

Prevent Data Breaches: Identity Logs and Machine Learning

Prevent Data Breaches? It’s Hard.

Identity Logs and Machine Learning: How To Approach the Problem

Machine Learning Algorithms Showing Promise

Log Embedding

Autoencoders

Initial Results

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Fortinet® Follies