Home » Security Bloggers Network » Part 2: Activating Your Masked Data Superpowers

Part 2: Activating Your Masked Data Superpowers

by Delphix on December 2, 2019

In our last installment, we discussed how preserving the business value of your masked data leads to quicker testing and better insights. Today, we’ll take those ideas to scale and discuss how the emergent practice of DataOps is leading enterprise data management and changing the enterprise feature delivery factory.

Every day, the stark reality is data breaches are becoming more frequent and the process of digitalization is well underway. While most approaches to security add complexity and time to workflows, DataOps practices using masking can help companies enable compliance and security while empowering enterprise teams to move faster.

Why Masking Brings Speed to DataOps

For the second year in a row, Gartner recognized DataOps as an Innovation Trigger in three separate Hype Cycle reports: Enterprise Information Management, Enterprise Data Management, and Data Security. But if you wanted to boil down the value of DataOps in the enterprise to a single word, it would be velocity. At its essence, DataOps is about delivering data quickly, securely, and automatically by aligning the people, process, and tech that can make it happen.

The 2019 Accelerate State of DevOps report driven by key industry leader Dr. Nicole Forsgren details four primary metrics through which we can apply a DataOp lens to understand the value masking has for enterprises. The majority of the findings show elite performers deploy on demand, have a one day or less lead time, restore service in under an hour, and have a change failure rate of less than 15 percent. That’s a whole lot of velocity, which in turn means a lot more business value a lot faster.

Six Key Masking Capabilities that Drive Faster Feature Pipelines

Before we discuss how masking from a DataOps perspective impacts the four primary metrics, let’s review six key capabilities a DataOps masking solution must have:

Rapidly reproduce synchronous, high-fidelity copies of multiple datasets in an on-demand library. Crazy and elusive time and referential integrity errors just dissolve as you can provision heterogeneous datasets (masked or unmasked) from the same point in time in just a few minutes.
Marries virtual data with masked data. That means you don’t have to go through 20 steps and wait 3 days to get your masked data; masked data is always available and ready to deploy at a moment’s notice.
Maintains automated synchronicity with masking. This makes it possible for disparate, geographically distinct and even air-gapped datasets to all be masked the same way. That means you can be referentially consistent within and across systems.
Built for scale. Got 12 systems and many datasets totaling 200 TB? No problem, you can virtualize and mask the whole dataset collection and, once ingested, deliver the entire collection in minutes.
Uses a policy-based masking approach. A policy-based obfuscation technique uses the domain of the data and metadata itself to decide how to mask. Combine this with masking in memory, and suddenly instead of fixing 40+ end points, there’s just one. Change management is radically simpler. Consequently, launching masking on a DataOps platform typically takes 80 percent less time than with traditional solutions. More importantly, it takes 99 percent less time when masking the second time.
Dead simple to know and recover the state of 1 or more large dataset(s). Typically, the cost of error falls into the 10 minutes or less range to recover. Ask yourself: How long would it take you to recover an environment with a mix of Oracle, SQL Server, and non-traditional data sources if someone accidentally ran a test? My bet is that number is in the days or weeks category, if it’s possible at all.

What’s the value of these capabilities to your software feature factory? Here’s how well-masked data impacts four key DevOps metrics:

Deployment Frequency

This isn’t how fast we deploy data. It’s the change in how often we can deploy/release code because we have better, more secure data faster. Deployment frequency is a function of stability and deploy time. With the ability to rapidly reproduce high-fidelity copies and with fresh versions of consistent masked data always at the ready, you create an island of data stability. Similarly, a significant portion of deploy time is taken up by test time, which in turn relies in large measure on the time it takes to get the right masked data.

Lead Time for Changes

Lead time is a function of delivery pipeline length, complexity, and volume. From a data perspective, this shows up as data scalability (can I get 3 masked 5TB test datasets?), agility (how fast can I move 3 datasets from box A to box B?), and data transformation challenges (can I automatically and quickly mask the data as change occurs?). Having collections of consistent, masked heterogeneous datasets ready at a moment’s notice creates enormous velocity. Changing a masking rule in one central location instead of at each possible end-point makes change management much faster. Having both of these superpowers creates opportunities to remove steps from the pipeline, reduce unnecessary controls on data, and standardize how environments are built. The result: reduced variability and greater velocity.

Time to Restore

Size, the need for fresh data, and the number of datasets to recover typically make this task bigger and more difficult as it scales up. But DataOps capabilities change that. First, data recovery is not a function of dataset size, with typical times being under 10 minutes for one or more datasets even as the datasets scale. Second, the recency of masked data has nothing to do with the speed of deployment. That is, the most recent masked data is always ready to deploy in that same 10 minutes, and a new set of masked data is repeatedly and frequently available.

Change Failure Rate

On average, defects occur at a rate of from 1 to 25 per 1000 lines of code, and data problems account for about 15 percent of those defects. It’s hard to make big datasets or collections of big datasets consistent and up to date. Thus, there is often a tradeoff between testing with the best data and finishing on time. But, the rapid delivery of data at scale, with rapid reset, can make that tradeoff evaporate. With DataOps, everyone (yes, everyone) can test with the right data in the same timeframe as their build. Teams get the best data without sacrificing speed using this approach. For example, Fannie Mae reduced their data-related defect rate from an estimated 15-20 percent to less than 5 percent.

Charting a Pathway to Becoming an Elite DataOps Organization

The common denominator is this: Without a DataOps platform, you’re just going to be slow. Adopting DataOps enables these three key capabilities:

Provides a stable and significantly faster delivery pipeline with fewer data-related steps, with the necessary data controls and standardized data delivery that all drive reduced variability on masked and unmasked data.
Rapidly deploys freshly masked, referentially correct, mobile, and agile data from a library of datasets in a way that’s easily repeatable and solves the masked data delivery problem at scale.
Makes change management much faster since policies drive change. Data integrity is simple, and there are simply fewer places to make updates.

A DataOps platform brings enormous business value to data masking as velocity rises, variability falls, and cost of change and error radically decrease. At the end of the day, your masked data can be readily used for the business insight you want for the data protection you need to create value and achieve success in today’s fast-changing world of technology.