By now most of us have grown accustomed to reading daily news about data breaches impacting organizations of all types and sizes. Usually we read with the intent of understanding the cause and effect of a breach, but in some cases we are personally affected. Collectively (and justifiably) we judge how swiftly an organization communicates, fixes and mitigates the damages of a breach.
Breaches come in all shapes and forms. Over the past 10 years, David McCandless at Information is Beautiful has done a fantastic job curating the occurrence and scope of data breaches affecting organizations.
Take a moment to explore this bubble chart. It is almost certain that your personal data was compromised as a result of one or more of these breaches.
And here is the data behind this visualization in spreadsheet form, which may give you yet another perspective on the scope of the problem.
Data never rests. In fact, about 2.5 quintillion bytes of it is created daily. Data has definition, taxonomy, a point of origin and one or more destinations.
Let’s illustrate a typical data lifecycle from the standpoint of an online banking application.
With the advent of Software as a Service (SaaS), traditional banking functions like accounting, money movement, fraud detection and wealth management are core competencies provided by other SaaS vendors. As a result, data created by the primary service is proliferated across many vendors. These vendors in turn engage with other SaaS vendors and so on.
In our example, the origination of data is a consumer interacting with the online baking service. When a consumer registers or signs in to the service, data objects are created that represent the customer persona. The lifetime of these objects is restricted to the scope of the customer session. A typical customer session triggers various functional flows to serve his or her banking needs, leading to the creation of many communication paths, both within the core application and across its boundary to other SaaS applications.
Within the scope of these flows, data elements are initialized, referenced, copied, transformed, persisted, sent to other SaaS channels and eventually de-scoped. First and foremost, it is important to classify these data elements based on degrees of sensitivity. Thereafter, the data element in focus must be observed in context of its participation in flows, both within and outside the boundaries of the application.
This example highlights the fact that cloud-based service applications may expose user information in ways that consumers neither expect nor appreciate. The vast majority of breaches are the result of a misconfiguration in the application or unexpected consequences of broad interconnectivity. Unfortunately it is not feasible for engineers or operations staff to check if every configuration option and piece of data handled by the application meets all privacy concerns, company guidelines, and other policies we may have for handling sensitive information.
There is no “sensitive” control-switch to alleviate such concerns, or if there is it requires onerous researching to enable the precise option. With the lack of accessible solutions, operations staff deploying cloud-based services can only harden the host surface (with trust-some or trust-none policies), examine values produced by actions in applications and define escalation workflows.
Current approaches include:
- Restricting ACL and Policy on storage buckets: While plugging holes in leaky storage buckets is all-too-common, it is not good practice. Every time you patch a hole, a new one forms. This reactive approach of patching old and new security threats is overwhelming and never-ending.
2. Sensitive data redaction from logs: This technique involves comparing every string in a log file against a series of regular expressions. It is, needless to say, process-heavy and compute-intensive. To mitigate false positives, we might consider entropy (measure of randomness) of matched expressions. However, any SHA-1 code generated from a base64 encoded set could represent either a sensitive data element (true positive) or a random GUID (false positive).
3. Identify data definition of structured data constructs (like database schemas or spreadsheet headers): Traditional approaches no longer work. We have rapidly entered a new phase of data breaches where unstructured or semi-structured data is the new attack vector. How do we identify sensitive data — historically stored in structured form in data destinations that we control — when it is spread across service providers in unstructured form?
Trying to fix symptoms instead of addressing the problem is a rat’s nest. You spend time, money and resources trying to make the problem go away. Yet, while taking such actions may make things seem better, it is a false sense of security because the core problem is lurking: how to protect sensitive data.
At ShiftLeft we help customers visualize and understand their sensitive data exposure based on semantics in and around an application’s data elements and their participation in flows. We provide customers with a comprehensive view of the security posture of their applications, what we call Security DNA.
We provide simple and intuitive policy that informs the Security DNA and injects reason and decision making into the app’s security profile. Based on the information we know about an application, we can apply policy to detect and classify sensitive data, to observe its transformations in procedural flows, and to identify egress points through connected input and output channels.
To learn more about ShiftLeft and get started with a free trial, visit our website at https://www.shiftleft.io/.
This is a Security Bloggers Network syndicated blog post authored by Chetan Conikee. Read the original post at: ShiftLeft Blog - Medium