Don’t Lock Up Peanut Butter in Fort Knox: The Smart Approach to Data Classification
Data classification is a critical capability that can ensure the most efficient use of resources by appropriately applying controls to the correct data and systems based on organizational risk.
One of the analogies I like to use is, “Would you lock up a peanut butter sandwich recipe in Fort Knox?” In security terms, would you apply the most restrictive and costly controls to data that has little impact if breached??
Data classification is how we enable a data-driven approach to managing risk and cost. Let’s explore how to design and implement this model effectively.
What is It?
Data classification lays out the rules on how users access, store and transmit data; it’s a critical function that must be in place for any mature IT security program to be effective and efficient. It is one of the core policies that I feel every user should read and be trained on. The policy identifies data types based on criteria such as regulation and sensitivity and attaches handling requirements to those data types.
Approach
A well-documented and thought-out plan for data classification with a crawl, walk and run mindset is critical to success here. Start with defining a policy, discover where the data resides and then select appropriate controls.
Policy
The policy should be light and have as few data classifications as possible to meet your business needs. The policy should have the following components (discussed in more detail below):
- Roles and Responsibilities
- Classifications/Data Types
- Handling Requirements
Discovery
The next step is to discover where the data is, who owns it and who is accessing it. Ideally, you can build data flow maps to help visualize the data and how it moves. (Several tools can be helpful in this process, with their respective strengths and weaknesses.) A key requirement is the ability to search structured and unstructured data sources to identify different data classifications. Another helpful feature is understanding and reporting on who can access classified data.
Implement Controls
Process and technology will help control data creation, access, movement and destruction. Depending on your environment, you may need multiple tools to do the job. For instance, some tools are good with unstructured data but struggle with structured data. Some tools allow you to change behavior with warnings before completely blocking data, which can reduce the risk of disrupting the business.
Data Loss Prevention (DLP) is a technology commonly used in data classification programs to control data movement, focusing on mitigating the risk of unauthorized access. Remembering the crawl, walk, run approach here is critical, as DLP technologies can be very disruptive to business operations. Implementing DLP is highly risky if you don’t know where the data is and how it flows through your environment.
Awareness
As with most security initiatives, awareness is key. The policy and controls here can require a change in process, which will impact your end users. Communicating the ‘why’ first, with training on procedure changes second, is key to adoption and compliance. This policy should be acknowledged upon hire and then annually thereafter.
Roles and Responsibilities
Like most good policies, roles and responsibilities are clearly defined, and there are some key concepts to this in the context of data classification.
Similar to risk, the information security manager should not carry all the responsibilities here. Their role is to own the policy and process and to facilitate its execution. Engage and bring the business into this process. If you are the individual who gets the most value out of the data, then you likely own it, and you must drive the rules around it. For example, if you are a growing AI startup and your LLMs and training methods are your secret sauce, then your data science leader owns this data and should be very involved in classifying and protecting it.
Here are some of the key roles and responsibilities involved in a functioning data classification program:
Role | Handling Requirements |
Data Owner | The data owner understands the business value of the data and is ultimately responsible for the classification and enforcement of the handling requirements. This does not mean that they necessarily implement all the controls, but they are responsible for ensuring they are in place and functioning. |
Data Custodian | Manages the technical implementation of data classification, including storage, encryption and access controls. |
Data Steward | Ensures data is classified and stored, accessed and protected according to the handling requirements. |
Data User | Accesses and uses classified data. |
Security Analyst | Identifies risks and recommends remediations. |
Chief Information Security Officer (CISO) | Owns the data classification strategy, implementing policies and compliance monitoring. |
Defining Data Types
The first step is to consider the buckets you will use in your policy. It is best to keep this to a minimum, and there are some very popular choices here. Data should be divided based on the value or potential impact it has if exposed to unauthorized individuals. Typically, have three or four buckets: Regulated or confidential data, sensitive internal use data and public data.
Data Type | Description |
Regulated | Personal Health Information (PHI) covered by HIPAA, or Non-Public Information (NPI), such as financial account numbers covered by GLBA. |
Sensitive | Unregulated data such as strategic plans, source code, customer lists and intellectual property (IP), data that can harm your competitive advantage if exposed to competitors. |
Internal Use | Project plans, organizational charts, procedures and meeting notes. This type of data may not have as big an impact if exposed as sensitive data, but it should still have some handling requirements to maintain confidentiality, albeit less resource-intensive requirements. |
Public | Can already be found through other sources and has no impact on exposure. There are no unauthorized individuals here, so there are no opportunities for fines or brand impact upon loss. |
Remember that your mileage may vary here, as the buckets should be relevant to your environment. For instance, if you don’t maintain regulated data, you won’t need a bucket for it. It is also important to note that each bucket generally has different handling requirements.
What are Handling Requirements?
Handling requirements are the rules for accessing, storing and disposing of data. They are attached to the data types to manage the risk of exposure and should be appropriate to the risk. Going back to the Fort Knox analogy, it makes no sense to apply the most stringent and often the costliest handling requirements for low-risk data types from a time and money perspective. This is how data classification can drive efficiency and effectiveness in your program.
Here are some examples of the handling requirements for some of the data types:
Data Type | Handling Requirements |
Regulated |
|
Sensitive |
|
Internal Use |
|
Public |
|
This is just a small subset of the handling requirements you may have. In the case of regulated data, you will need to map your regulatory requirements to the handling requirements to maintain compliance.
In Summary
Data classification plays a crucial role in enabling a data-driven approach to managing risk and cost. By effectively implementing a well-documented policy, discovering where data resides and implementing appropriate controls, businesses can enhance their security posture and operational efficiency. Through clear roles and responsibilities, defining data types and establishing handling requirements, organizations can ensure that the right data is protected with the necessary measures. In the end, the strategic application of data classification will enhance security and improve resource allocation, leading to an improved IT security program.