The Future of SecOps: Regaining Balance

Posted under: Research and Analysis

The first post in this series, Behind the 8 Ball brought up a number of key issues regarding the challenge of practicing security in the current environment. These include the continual advancement and innovation on the part of attackers to find new ways to compromise devices and exfiltrate data, the increasing complexity of the technology infrastructure and frequency of changes to said infrastructure, as well as the systemic skills shortage limiting the resources you have to handle all of the issues that the former issues create. So basically, practitioners are behind the 8 ball in getting their job done and protecting corporate data.

As we discussed in that earlier post, thinking differently about security involves you to change things up and take a (dare we say it) more enlightened approach, basically focusing the right resources on the right functions. We know, it seems obvious that having expensive staffers focusing on rote and tedious functions is probably not the best way to deploy your resources. But most organizations do it anyway. Maybe it makes some sense to have our valuable, constrained and (in most cases) highly skilled humans doing what humans are good at, which is:

  • identifying triggers that potentially indicate malicious activity;
  • drilling into the activity to understand the depth of the attack and assess potential damage; and
  • figuring out work arounds to address the attack.

Most humans know what to look for, but aren’t very good at looking at huge amounts of data and finding those patterns. Many don’t like doing the same things over and over again since they get bored. They don’t like to work graveyard shifts and they want to be doing things that teach them new things and stretch their capabilities. Basically they want to work in an environment where they do cool stuff and can grow their skills. And they (especially in security) can choose where they work. If they don’t get the right opportunity with your organization, they will find one that better suits their capabilities and work style.

On the other hand, machines don’t really have issues working at all times and don’t complain about having to do the same tasks over and over again, at least not yet. They don’t have the ability to find another place to work, nor do they agitate for broader job responsibilities or better refreshments in the break room. We’re being a bit facetious here, and certainly aren’t advocating replacing your security team with the robots. Rather, in an asymmetric environment where you can’t keep up with the amount of stuff to do, the robots may be your only chance to regain balance and keep pace.

So if you are open to this line of thinking, let’s expand a bit on two of the concepts we brought up in the Intro to Threat Operations paper, since we believe our vision of threat operations over time becomes a subset of SecOps.

  • Enriching Alerts: The idea is to take an alert and add a bunch of the common information you know that an analyst is going to want to see in the alert prior to sending it to the analyst. Thus, the analyst doesn’t need to spend time gathering information from a number of systems and information sources, and can get right to work validating the alert and determining the potential impact.
  • Incident Response: Once an alert has been validated, there are a standard set of activities that tend to be involved in the response. A portion of these activities can automated via integration with the affected systems (networks, endpoint management, SaaS, etc.) and that time saved allows the responders to focus more on the higher level impact of determining proliferation and assessing potential data loss.

Enriching Alerts

Let’s dig into examples of how enriching alerts coming from your security monitoring systems would look and detail how this can be done without human intervention. We start by looking at a couple of different alerts, and making some educated guesses as to what would be useful for an analyst.

  • Alert: Connection to a known bad IP: Let’s say an alert fires for connectivity to a known bad IP address (thanks threat intel!). With the source and destination addresses the analyst would typically start gathering some basic information.
    1. Identity: Who uses the device? With the source IP, it’s pretty straight forward to see who the IP address is allocated to, and then what devices that person tends to use.

    1. Target: Using the destination IP, external site comes more into focus. The analyst would probably do a geo-location search to figure out where the IP is and also probably ping WHOIS to figure out who owns the IP. They could also figure out the hosting provider and also do a search within their threat intel service to see if that IP belongs to a known bot net and the tactics that specific adversary tends to use.
    2. Network traffic: The analyst may also check out the network traffic coming from the device to see if there were strange patterns (possibly C&C or recon) or uncharacteristically large volumes sent to/from that device over the past few days.
    3. Device hygiene: The analyst would also need to know the specifics about the device. Such as when was the device last patched? Does the device have a non-standard configuration?
    4. Recent changes: The analyst also would probably be interested in the software running on the device and if any programs or changed configurations have been installed within the past few days.
  • Alert: Strange registry activity: In this scenario, an alert is triggered because the device has had the registry changed unrelated to patches or authorized software installs. The analyst could use similar information as in the first example, but device hygiene and recent changes to the device would be of particular interest. The general flow of network traffic would also be of interest, given that the device may have been accepting connections from external devices issuing instructions and making those configuration changes. Standing alone, registry changes may not be a concern, but when those connections are within close proximity of a larger inbound data transfer, there may be something there. Additionally, checking out the web traffic logs from that device could also provide some clues as to what they were doing that could have resulted in some kind of compromised device.

  • Alert: Large USB file transfer: We could also look at an insider threat scenario to show the impact of enrichment. Maybe the insider uses their USB port for the first time and transfers 1GB of data within a 3 hour period. That would generate an alert from the DLP system. At that point, it would be good to know what internal data sources the device has been communicating with, and if there have been anomalous data volumes over the past few days, which could indicate information mining in preparation to take it. It would also be helpful to look at the inbound connections and recent changes on the device, since perhaps the device had been compromised by an external actor using a remote Trojan to misbehave on the device.

In these scenarios, and another 1000 that we could concoct, all of the information the analyst would probably like to have is readily available within existing systems and security data/intel sources. Whatever tool that analyst uses to manage the triage can be pre-populated with this information.

The ability to enrich the alert doesn’t end there. If there are files involved in the connection, the system could automatically poll an external file reputation service to see whether the file is known to be malicious. The file sample could be set to a sandbox that generates a report of what the file actually does, and if that file tends to be part of a known attack pattern. Additionally, if the file does turn out to be part of a malware kit, the system could then search for other files known to be related to the first file on the device, as well as possibly in other devices within the organization.

All of this can be done before the analyst ever starts processing the alert. Obviously these are pretty simplistic examples, but should illuminate the possibilities of automated enrichment to give the analyst a large portion of what they need to figure out if the alert is legit and if so, how significant the risk.

Incident Response

Once the analyst validates the alert and does an initial damage assessment, the incident would be sent along to the response team to deal with it. At this point, there are a number of activities that can be done without the responder’s direct involvement that can serve to accelerate the response. If we look at potential response activities associated with the alerts above, you can see how orchestration and automation can make the responder far more efficient, and reduce the risk of the attack.

  • Connection to known bad IP: Let’s say the analyst determined that the device connected to a known bad IP because it was compromised and added to a botnet. What would the responder then want to do?
    1. Isolate the device: First the device should be isolated from the network and put on a quarantine network to enable much deeper monitoring (the quarantine network can capture full packets) as well as preventing any further exfiltration of data.
    2. Forensic images: The responder will need to take an image of the device for further analysis and to maintain chain of custody.
    3. Load tools onto the imaged device: The standard set of forensic tools are then loaded up and the images connected for both disk and memory forensics.

All of these functions can happen automatically, once the alert is validated and the incident is escalated to the response team. Then the responder has the images from the compromised device, the forensic tools ready to go, and the case file with all of the enriched information about the attack and potential adversary at their fingertips when they start the response.

But the opportunities to work faster and better don’t end there. If the responder discovers a system file that has been changed on the compromised device, they can then further automate their process. They can search through the security analytics system to see of that file (or something like it) has been downloaded to any other devices, they can run the file through a sandbox to determine its behaviors and then search for those behaviors, and if they get a hit on other potentially compromised devices, they can add those to the response by isolating and imaging the devices — automatically.

These same constructs apply to pretty much any kind of alert/case that would come across a responder’s desk. The registry activity alert mentioned above would likely focus more on memory forensics, but the same general processes apply.

Ditto for the large USB file transfer indicative of an insider attack. Though in this case, it’s likely more prudent to not necessarily isolate the device because you don’t want the alert the insider they’ve been discovered. So that kind of alert would trigger a different automated run book, likely involving full packet capture of the device, analysis of their file usage over the past 60-90 days, and notifying Human Resources and Legal of the potential malicious insider.

What is the common thread running amongst all of these scenarios? The ability to accelerate SecOps by planning out the activities (in the form of run books), and then orchestrating and automating the execution of those run books to the greatest degree possible.

Benefits

These seem to be self-evident, but let’s be masters of the obvious and state them anyway. This potential future of security operations allows you to:

  • React Faster and Better: Your analysts have better information because the alerts they get have the information they would have to spend time gathering. Your responders are more on point because they already have the potentially compromised devices isolated and imaged, and a wealth of threat intel about what the attack could be, who is behind it, and what their likely next move is.
  • Operationalizing process: Your best folks just know what to do, your other folks typically have no idea so they stumble and meander through each incident, with a portion figuring it out and another portion looking for another gig. If you could have your best folks build the run books that define proper processes for the most common situations, you minimize the variance in performance and make everyone a lot more productive.
  • Improve employee retention: Employees that work in an environment where they can be successful and have the right tools to achieve their objectives tend to stay. It’s not about the money for most security folks, it’s about being able to do their job. If you have the right systems in place to keep the humans doing what humans are good at, and your competition (for staff) doesn’t, then it becomes increasingly hard for employees to leave. Some will choose to build a similar environment somewhere else, and that’s great and the way the industry improves. But many realize how hard it is and what a step backwards it would be to have to manually do a lot of what you’ve already automated.

So what are you waiting for? We never like to sell past the close, but we’ll do it anyway. Enriching alerts and incident response are really only the tip of the iceberg relative to the SecOps processes that can be accelerated and improved with a dose of orchestration and automation. So we’ll wrap up the series with the next post, which details a few more use cases that should provide overwhelming evidence of the need to embrace the future.

– Mike Rothman
(0) Comments
Subscribe to our daily email digest

This is a Security Bloggers Network syndicated blog post authored by info@securosis.com (Securosis). Read the original post at: Securosis Blog