SBN

Reducing Alert Fatigue with Automation

While the numbers vary from study to study based on research methodology, organizational profile and differences in survey questions, there is nearly universal agreement that alert fatigue is a significant issue for most security teams. And the high volume of alerts consists largely of false positives. In fact, some security analysts report spending as much as 75% of their time investigating false positives, which aggravates the problem. Which leads to analyst burnout and a lot of time and money wasted while real threats get lost in the mix.

That’s why automation is so critical for any security operations team, and LogicHub’s MDR is no exception. We use our SOAR+ platform to make our analysts 20-30X more efficient by automating the alert triage and threat detection process to ensure that they’re spending the overwhelming majority of their time investigating and responding to real attacks.

What we mean by “Alert Fatigue”
But let’s back up a bit and start with what we actually mean by false positives, and why they’re such a problem. What we’re specifically talking about is when an overwhelming volume of alerts or repeated presentation of similar alerts desensitizes the people tasked with responding to them. That’s a problem because it ultimately ends up leading to missed or ignored alerts, or delayed responses, and the consequences can be devastating. Look no further than the Target breach, which was in no small part the end result of a 40,000/day alert volume.

The typical event funnel

Causes of “Alert Fatigue”
Alert fatigue isn’t just the result of tools generating false positives. It’s the end product of many variables, including user behavior, poorly defined policies and processes, and the failure to adequately integrate and configure the security stack to effectively analyze and aggregate data into accurate and manageable output. And there are many things that can contribute to alert fatigue, including:

  • Poor content design – Skipping alert aggregation
  • Builds cases prematurely
  • Companies identify potential indicators of compromise and develop cases directly from these IOCs
  • Instead, companies should identify and aggregate IOC’s into an intermediate repository and then aggregate the alerts into cases
  • Misses opportunity to aggregate multiple alerts and develop a comprehensive picture of attack activity
  • New cases for repeat alert activity
  • Excessive alert volume
  • 1500 alerts/day for PDF attachment received via email
  • False positives – Poorly written alerts (alert fires when it shouldn’t)
  • User downloaded executable file from web
  • Actually went to https://www.google.com/search?q=podcast.exe&sourceid=chrome&ie=UTF-8
  • Permitted activity (but still potentially malicious)
  • Users are allowed to download executable files from the internet
  • Non-actionable activity
  • Recon scan (port scan/network sweep) by external address
  • Excessive time/complexity to investigate
  • Look up DNS information, IP reputation, check threat intel lists, check for other activity, etc.
  • Required manual follow-up notifications to users and external parties
  • Extreme sensitivity
  • Setting alert thresholds too low for fear of missing an attack
  • e.g. – failed logins > 5

Bottom line – all of these problems can be ascribed to a single common root cause – presenting the information to human analysts before it is ready for a human decision.

How LogicHub avoids or solves these issues in our own SOC
LogicHub’s SOC is no different from any other in the sense that we have a security stack generating a large number of events and alerts that our analysts have to get through on behalf of our customers. In fact, we allow our individual customers to bring their preferred tools to the table and we have to monitor and analyze alerts from all of them. In order to deliver detection and response services it’s critical for our SOC to leverage automation to allow them to operate with efficiency and the accuracy and speed that our customers demand.

So how do we do that?

  1. We have a library of > 800 automated detections for indicators of compromise currently spanning more than 35 types of products and log sources (we have existing integrations for 100s more)
  2. These detections automatically identify IOCs,map them to a MITRE ATT&CK Tactic/Technique, assign a risk score for the IOC, and then write to an intermediate alert repository
  3. The alert repository is automatically reviewed by another process (we call it the metaflow) that searches across all IOCs (alerts), develops a risk score for the aggregated alerts, and finally delivers the aggregated results to our Smart Case Creator
  4. Smart Case Creator looks at existing cases, and,
    1. If a case is already open for the actor/asset, the platform appends new information to the existing case.
    2. If a matching case is not found, the platform opens a new case
  5. Created cases are enriched and verified by automated triage playbooks to automate many common level 1 analyst tasks.
  6. Whenever the process allows, the case is automatically resolved and closed.
  7. Those cases that are left are then analyzed by LogicHub’s security analysts, who can invoke other automated commands to speed investigation and triage. These commands and processes may automate tasks like email delivery or request responses through pre-built forms to standardize communications, speed communication times, and reduce required analyst actions

What does that actually look like in practice?
The bottom line is that by storing events in the intermediary data store, LogicHub avoids efficiency problems caused when alerts have exceedingly low thresholds or trigger false positives. The ability of our platform, either within the initial detection or the subsequent triage, to automatically perform additional enrichment and verification of activity ensures that cases are verified and ready for review by the time they are created. By scoring the aggregation of alerts in the platform by the user and asset performed, we are able to prioritize cases by risk, allowing analysts to identify the threats of greatest concern first.

A sample attack scenario to show this in action:

In this scenario, the event became interesting when the collective of all user actions was aggregated into a single story. Traditional SIEM detection solutions might create 6 or more tickets for this one event. These events may or may not be worked by the same security analyst, and even if they do, it becomes difficult for the analyst to manually recall and join all the activities into a single story. This also assumes that they have the capacity to investigate the alerts because there were 12,000 other alerts for attachments, powershell, scheduled tasks, and suspect network communications that same day.

In contrast, LogicHub creates one case for the analysts and can support automated triage playbooks to notify the user, look for other related attack activity and perform remediation activities.

*** This is a Security Bloggers Network syndicated blog from Blog | LogicHub® authored by Anthony Morris. Read the original post at: https://www.logichub.com/blog/reducing-alert-fatigue-automation

Secure Guardrails