Reading the Data Breach Tea Leaves: Preventing Data Exfiltration Before it Happens
Data breaches are commonplace, and data exfiltration has traditionally been the end goal among threat actors whether it’s for financial gain, political gain or to simply wreak havoc. No matter what the reason, cybersecurity starts with a key question, “How could cybercriminals gain access to our sensitive data?” To find the answer, we must look at what has changed in IT environments and the introduction of cloud. According to SentinelOne’s ‘The State of Cloud Ransomware in 2024’, threat actors are increasingly using cloud services to identify the data they intend to exfiltrate or ransom.
Vulnerable Cloud Environments Start with Misconfiguration
Today, corporations are focusing on cloud native development using cloud storage and micro service architectures that rely on API communications. Simply put, cloud-native architecture and storage use has given threat actors a front door to the crown jewels, enabling them to take the data they want in record time.
The use of cloud storage allows data exfiltration to occur quickly. In today’s cloud environment, virtualization is stacked on top of virtualization, which can create security risks. The benefits of cloud native development, containers and microservices include the ability for development teams to deploy new builds quickly. Each team can focus on their part of their service and leverage continuous updates in a fast-moving environment. However, this also leads to the higher potential for misconfiguration. Misconfigurations breed vulnerability. And when there’s vulnerability, threat actors are on standby to take advantage.
Simple mistakes such as incorrect misconfiguration in the permission settings can create security concerns. In 2024, The Register reported that Microsoft Power Pages misconfigurations were exposing sensitive data. A security researcher uncovered significant amounts of data left out in the open for anyone to take a look at thanks to misconfigured access controls in websites built using Power Pages. The security holes to solve operational issues can grow when a developer and application owner do not realize that their actions are introducing additional security concerns. Security Week reported that security researchers at Palo Alto Networks spotted a threat actor extorting organizations after compromising their cloud environments using inadvertently exposed environment variables.
Traditional Network Monitoring vs. Modern Monitoring: Behavior Analytics
The “traditional” ways of attacking networking environments haven’t gone away. Security teams still rely on analyzing network traffic and intrusion prevention systems and firewalls to monitor network traffic and detect malicious payloads.
API inspection follows the same type of approach. Security teams should look for indicators that flag potential intrusion behavior in their API’s activity. Within cloud computing, there is intrinsic value in monitoring APIs and queries made to API services and storage facilities, such as Amazon S3. Similar to file folder permissions, relaxed access controls to cloud storage enable threat actors to gain access to sensitive data, skipping steps traditionally necessary for reconnaissance, privilege escalation and lateral movement. Security still requires visibility into ‘who’ is accessing ‘what’ and must be able to perform behavioral analytics, even with service-to-service communication. By monitoring the behavior of traffic, security teams can tell the difference between service accounts and user accounts, and therefore when a user is accessing a service ordinarily accessed by a service. A service transaction will look quite different than a user transaction. However, teams cannot rely on behavioral analysis alone.
Security needs to go beyond the behavior of the API transaction and to the content of the API request and response. Typically, API behavior analysis only identifies the HTTP response codes (e.g. 200 – OK, 500 – Error). Just analyzing the HTTP response misses that the 200 OK included other customers’ data due to misconfigured access controls, or the 500 – Error included confidential server information.
Today’s Security Information Event Management (SIEM) systems are where modern monitoring kicks into high gear. They can see the bigger picture by piecing together information from multiple log and audit sources, including APIs – who is and how often they are authenticating, as well as what data and how much of it they are grabbing. Let’s look at what happens when you query a public API of an application. The API is meant to be public so that partners can create applications for a SaaS application. While there are benefits to a public API, it can also be misused and misconfigured, enabling access to customer data. By shining a spotlight on what’s in API traffic, you can then look for attack signatures or behaviors within the API traffic to recognize when something doesn’t make sense. And that’s where we can catch these fast flight data exfiltration attempts.
Corroborating the Evidence: Asset Risk Models
Once we know how cybercriminals are accessing sensitive data, we must also ask, “How do we stop them from gaining access to this data?”
Today’s security strategies need to evaluate the risk, incorporate risk models and automation based on a combination of risk input and risk amplifiers. This means evaluating the runtime activity from an asset using a variety of analytical techniques, combining the results to learn the risk input. Risk amplifiers add additional context to amplify the risk input.
While it can sound complicated, let’s break it down. If we see one hundred of the same alert from an asset versus one hundred different alerts from an asset, the 100 different alerts are clearly a more concerning situation. So we use the variance of the activities as one method to amplify the risk on that asset. This way, we aren’t just counting alerts but calculating the probability that the asset is compromised by evaluating the severity of the activities, the variance of the activities and other factors that all targeted the same asset. An asset is defined not just as a system, but also a user, cloud storage, cloud service, etc. If we rely on a single analytical technique, we will be bombarded with false positives, and we won’t know what to look for. So we take advantage of the findings from multiple analytical techniques for our risk inputs. We are not triaging each individual finding, but rather the asset with high-risk scores. False positives are automatically suppressed since there will be little corroborating evidence to increase the risk score of the asset. Multiple analytical findings corroborate a true positive. With enough smoke you can assume there is fire.
The good news is that while cloud services and storage have provided more prominent access to enable data exfiltration, threat actors still have to take multiple steps to fully realize their objectives. Threat actors need to learn your APIs, the data boundaries, and find the exploits. Organizations must read the tea leaves to recognize what these analytical findings indicate. If a true threat exists, the next step is to shut down that user, the account and update the API before data exfiltration. Rather than plugging holes after the fact or cleaning up after data theft, risk analytics can help prevent data exfiltration because it identifies enough evidence to suggest what will happen.
Predicting Risk to Close the Vulnerability Gap
Collapsing behavior changes and attack indicators to a common asset is a modern way to predict risk. Security teams can recognize when multiple activities are associated with the same campaign in the environment and connect the dots. Teams can know that the risk of that event is amplified because they see it associated with another campaign that occurred in that environment. As a result, the asset’s risk goes up. And now organizations have full visibility of all the activities that happened on that asset and can recognize what are true positives. Security analysts will have automatically compiled information at their fingertips and begin investigations with a head start.
If organizations are focusing on detecting data exfiltration, they are too late. Critical security steps should happen before data exfiltration does. Identifying corroborating evidence to recognize that a breach is taking place is the most important step to protecting an organization and its customers.