Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for Detection

Posted under: Research and Analysis

As we resume our posting of the Endpoint Detection and Response (D/R) selection criteria, let’s start by focusing specifically on the detection use case.

Before we get too far into the capabilities, we should clear up some semantics about the word detection. If we refer back to the timeline presented in the Prevention Selection Criteria, the detection function happens in the during execution period. In fact, you can make the case that “detection” of malicious activity is what triggers blocking and thus is the pre-requisite to preventing an attack (otherwise how would you know it needs to be prevented?). But that’s too confusing. Let’s just say that prevention is when you are blocking the attack before it compromises the machine, and can happen both prior to and during execution. Detection happens during execution and post-execution and typically involves the device being compromised because the attack was not prevented.

Data Collection

Modern detection requires significant analysis across a wide variety of telemetry sources captured from the endpoint. Once the telemetry is captured, a baseline of normal activity on the endpoint is established which is then used to look for anomalous behavior.

Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should be aggregating and analyzing the following types of data:

  • Endpoint logs: Endpoints can generate a huge number of log records, and the inclination (given the amount of data) is to restrict the number of logs captured, but we recommend collecting as many logs off the endpoint as possible. And the more granular the better, given the sophistication of the attacker and the fact that attackers may target anything on the device. If you do not collect the data on the endpoint, there is no getting it back once you are trying to investigate the attack. Thus optimally the endpoint agent collects not just operating system activity records, but also any logging that happens within applications (where possible). This includes identity activity (like new account creation and privilege escalation), process launches, and file system activity (a key when detection ransomware). There is some nuance in how long you retain the collected data, given that the amount of data can be voluminous and compute intensive on the device to process and analyze.

  • Processes: One of the more reliable ways to detect malicious behavior is which O/S processes are started and where they are launched from. This is especially critical when detecting scripting attacks, as the attackers use legitimate system processes to launch malicious child processes.

  • Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. The agent can also look for connections to known malicious sites and for recon activity on the local network.

  • Memory: Given the prevalence of file-less attacks, which don’t store any malicious code in the file system, modern advanced detection requires monitoring and analyzing activity within the memory of the endpoint.

  • Registry: Similar to memory-based attacks, attackers frequently store the malicious code within the registry of the devices to evade file system detection. Thus the advanced detection agent needs to monitor and analyze registry activity for signs of misuse.

  • Configuration changes: It’s hard for attackers to totally obfuscate what is happening on the endpoint, and that’s why gathering and analyzing data on configuration changes to the device can be instrumental in attack detection.

  • File integrity: Another long standing method of detecting attackers is to monitor for changes to system files, since if those files are changed outside of a patching activity, it’s usually for something malicious. The advanced endpoint agent should collect data and look for situations where system files are changed.

Analytics

As mentioned above, traditional endpoint detection relied significantly on simple file hashes and behavioral indicators to detect attacks. Given the sophistication of the attacks in the wild, a more robust and scientific approach is required to determine what is legitimate activity versus malicious intent. This more scientific approach is centered around machine learning (advanced math) techniques to understand the patterns of activity adversaries use before and during the attack. The detection products use huge amounts (think terabytes) of endpoint telemetry to train mathematical models to detect anomalous activity and find commonalities to how attackers behave. These models then generate an attack score to help prioritize the alerts.

  • Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, thus the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach as taken to prevent the attacks (discussed in the Prevention Selection Criteria guide).

  • Anomaly detection: With the profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models look for activity that isn’t normal for the device. At that point, the alert would trigger and the device should be marked as suspicious causing an analyst to validate the alert ensuring it is not a false positive.

  • Tuning: Speaking of false positives, the detection function needs to be constantly learning from what is really an attack and what isn’t, based upon the results of the detection in your environment. From a process standpoint, you’ll want to ensure the feedback is captured by the detection offering and used to constantly improve the models to keep your detection current.

  • Risk scoring: We aren’t really big fans of arbitrary risk scoring approaches, given that the underlying math can be suspect. That being said, there is a role for risk scoring in an endpoint attack detection context, and that’s for prioritization. With dozens of alerts hitting daily (maybe significantly more), it’s important to be able to weigh which alerts warrant immediate investigation and a risk score should be able to tell you. Although be sure to inquire as to the underlying scoring methodology and track the accuracy of the score and tune accordingly based on your environment.

  • Data Management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from your endpoints is critical to the functioning of the system. Inevitably you’ll run into the question of where to store all the data, how to scale the analytics to 10s or 100s of thousands of endpoints, and how to economically analyze the security data. But ultimately the technology decision gets down to a few factors:
    Cost: Regardless of whether the cost of storage and analytics is included in the service (if the vendor stores all of your telemetry in a cloud instance) or whether you have to provision a cluster of compute in your data center to do the math, there is a cost to crunch all of the numbers. Make sure that internal hardware, storage, and networking costs (including management) is included in your analysis. You want to be able to get an apples to apples comparison between what it costs to build an analytics capability versus buying one. And also think about the scale model for both an on-prem and cloud-based solution, since you may decide to collect a bunch more endpoint data at some point and don’t want to get hit with a huge up-charge.
    Performance: Based on your data volumes now and projected, how will the system perform? Different analytical techniques scale differently and you’ll want to dig in a bit with the vendor to understand how the performance of the system will be impacted if you do significantly add endpoints or start analyzing more sources of endpoint data.

Threat Intel

With today’s advanced attackers, it’s an imperative to be able to learn from other attacks in the wild. That’s where Threat Intelligence comes into play, so you’ll want any endpoint detection solution to have access to timely and robust threat intel. That can be directly from the endpoint detection vendor or a third party (or both), but being able to look for signs of attacks you haven’t seen yet is a must.

  • Broader indicators: Traditional endpoint protection relied mostly on file hashes to detect malware. When file hashes ceased to be effective, behavioral indicators were added to look for patterns of typically malicious activity. Advanced detection continues to broaden the types of indicators required, including patterns inherent to memory, registry and scripting attacks.

  • Campaign visibility: It’s not enough to detect attacks on a single endpoint, since today’s adversaries typically involve many devices to achieve their mission. You want to make sure that the threat intel isn’t just about indicators to look for on a specific endpoint, rather reflecting patterns of activity across many devices indicating a more sophisticated campaign.

  • Network activity: Another aspect of modern day detection involves the usage of legitimate applications and authorized system functions for malicious intent. At some point during the attack campaign a compromised device will need to communicate with either the command and control network and other devices on the network (or typically both). That means you’ll need to monitor the network activity on the endpoint and look for patterns of suspicious activity and for connections to known bad networks.

  • Shared intelligence: Given the additional context threat intelligence can provide for endpoint detection, leveraging intelligence from a number of organizations can really enhance detection. Sharing intel bi-directionally amongst a community of like organizations, where appropriate and secure, is another way to magnify the benefit of external threat data.

Detecting Campaigns

As we’ve mentioned, advanced attackers rarely begin and end an attack using only one device. They typically orchestrate a multi-faceted attack involving many tactics on many devices to achieve their mission. That means you cannot understand the adversary’s objective or tactics if your detection and view is limited to only a single device. Thus, aggregating telemetry across devices and looking for signs of a coordinated attack (or campaign) is another key aspect of advanced detection. To be clear, a campaign always starts with an attack, so looking for malicious activity on a single device is where you start. But it’s not sufficient to truly detect an advanced adversary’s activities.

  • Timeline visualization: Given the complexity of both your environment and the attacker’s tactics, many security analysts find visualizing the attack to be helpful in understanding the campaign. Having the ability to see all of the devices and be able to view the activity of the attacker across the environment, while being able to drill down on specific devices to do a deeper analysis/validation of the attack on the specific device streamlines the effort to understand the depth of the attack and plan a response.

Enriching Alerts

As we discussed in our Threat Operations research, it’s critical to make security analysts as effective and efficient as possible. That means eliminating a lot of the busy work traditionally required by providing the information for validation and triage. This includes:

  • Adversary information: Depending on the malware, tactics or networks detected during the alert, information about potential adversaries can be gathered from threat intelligence sources and presented to the analyst, so they have additional context about what the attacker tends to do and what they are trying to achieve.

  • Artifacts: Assembling data related to the attack and the endpoint in question (like memory dumps, file samples, network packets, etc.) as part of the detection process saves analysts a bunch of time and provides the information they need to immediately drill down into the device once they get the alert.

  • Organizational history: Attackers don’t use new attacks unless they have to, thus being able to see if a specific attack/tactic has been used before on the organization (or is being used currently) also provides additional context for the analyst to figure out the intent of the attacker and the depth of the compromise.

  • Automating enrichment: A lot of the enrichment information can be gathered automatically, so a key capability of the detection platform will be to look for attributes (like IP address, malware hashes, botnet address) and populate the case file automatically with this supplemental information before it’s sent to the analyst.

Leveraging the Cloud

Given the cloudification of pretty much everything in technology, advanced endpoint detection also receives significant benefits from the cloud. A lot of the leverage comes from more effective analysis both within an organization and across organizations (sharing threat data). There are also advantages in managing thousands of endpoints across many locations and geographies via the cloud, but we’ll discuss that later in the key technologies section. Some considerations (both positive and not) include:

  • Cloud scale: Depending on the size of the organization, endpoints can generate a tremendous amount of telemetry. Analyzing that telemetry on an ongoing basis consumes a lot of storage and compute. The cloud is pretty good at scaling up storage and compute, so it makes sense to shift processing to the cloud where possible.

  • Local pre-processing: As good as the cloud is at scaling, there is some pre-processing that can be done on each device to only send pertinent telemetry up to the cloud. Some vendors send all telemetry to the cloud, and that can work, but there are trade-offs in terms of performance, latency, and cost. Having some local analysis on the device allows attacks to be detected earlier in the process, which is always preferable.

  • Data movement: The next consideration relative to leveraging the cloud is how to most efficiently move all of the data destined to be analyzed in the cloud. All endpoints can connect to the cloud service and send its telemetry, but that may consume a bunch of network bandwidth (depending on what is collected) and may not be the most efficient means of moving data. Alternatively, there may be an aggregation point on-prem to do some additional processing (normalization, reduction, compression) before the information is sent to the cloud. The approaches are not mutually exclusive, since laptops won’t always be on the corporate network to send the data to the aggregation point. The point is to be aware of network consumption when designing the architecture of the endpoint detection system.

  • Data security/Privacy: Doing endpoint security analysis in the cloud necessarily involves sending telemetry (even if it’s metadata) to networks and systems not under your control. From a diligence standpoint, you need to understand how the vendor’s analytics infrastructure protects your data. You’ll want to dig into multi-tenancy and data protection to understand if/how other organizations could access your data (even inadvertently). Also be sure to probe about if/how your data is anonymized when used for shared threat intelligence purposes. Finally, if you stop working with the vendor, make sure you understand how you can access your data, whether you can port it to another system, and ensure your data is destroyed.

In the next post, we’ll dig into the response and hunting use cases.

– Mike Rothman
(0) Comments
Subscribe to our daily email digest

This is a Security Bloggers Network syndicated blog post authored by info@securosis.com (Securosis). Read the original post at: Securosis Blog