Detection Coverage and Detection-in-Depth

For some time, I’ve been also fascinated with the concept of detection-in- depth and a somewhat related concept of optimal detection coverage.

This fascination was born out of a particular type of analyst inquiry I used to get: if I have SIEM, do I also need an EDR? If I have SIEM with sysmon and Zeek data, do I also need an NTA tool? If I have anti-malware tool with an EDR module and a separate NTA tool, do I also need a SIEM?

These all map to a problem of detection layers and detection coverage, hence “detection in depth.” While many people ramble about “defense in depth, ” I feel that much fewer actually implement and practice layers of detection controls in their environments. As often happens in security, talkers talk … while some doers don’t do — they just buy more tools and then have them sit unused 🙂

You can also look at this as a logical evolution of defense in depth. The whole premise was that you were preventing things with layers of controls. Now, the premise is you should be able to see things from various angles and layers. Things missed by some layers will be caught by other layers. This should be true even if — especially if — the attacker actively countered some layers of visibility (e.g. turned off logging, avoiding writing to disk, provided their own network link, etc). Apart from this, observing the same activity from different layers may lead to additional security insights (security fusion FTW!), even if no layer is compromised.

As an analyst, I was involved in several attempts to bring clarity to this question, starting from the 2014 “attack chain” paper (still a decent read, BTW) and then in our famous paper on starting your detection and response efforts (and perhaps here too). Even my SOC nuclear triad idea (which recently resurfaced as Anton’s security visibility triad) is pointing in the same direction, if vaguely. It basically says that you should collect and analyze logs, network data and endpoint data — but this advice is way too generic for this discussion.

So, what about a more operational guidance focused on specific systems and environments? This is still not easily available anywhere …

So, basically this is about TWO closely-related things:

  1. What telemetry you need for good detection in general?
  2. What telemetry you need for good detection on each asset given the telemetry you already collect elsewhere?

The above-linked documents (some are behind the paywall — sorry, the analysts need to eat) seek to answer 1. (and IMHO do a decent job there), but none really attempt to answer 2. with any depth (note that I am casually avoiding the question “detection of what?” by punting this to an old threat assessment discussion).

As extra credit, what other useful question did I just punt? Data quality / fidelity / consistency. If you have a chance to, say, get firewalls logs, netflow or full pcap of the same network link, which would you choose? Windows authentication events logs or EDR agent captures of authentication activity? This would also be a topic for another day.

Before I continue, I want to point out that most organizations don’t start with “detection in depth” (they also likely didn’t start with “defense in depth”, either). They usually start with one approach, picked based on all sorts of non-security considerations.

So, let’s unroll the main question:

1. For many organizations, the starting state is often about choosing ONE detection technology (it may be even whatever your managed provider has) and deploying it to cover a subset of assets:

2. Next comes the expected “???” — many expect AI magic here, naturally. But I’d prefer some useful answers. Or, at least useful questions.

3. And then some organizations arrive at a glamorous end-state where you have comprehensive visibility over systems, networks, applications, data, etc; this likely involves all of the above (logs + network + endpoint), but also coverage over mobile devices, cloud, serverless, IoT, <insert the list of cool new technologies then remove blockchain from the list :-)>

And now here is my next question: how do we plan for a reasonably effective state 2. marked by a mysterious “???” above? Or, how to do we do better than a random mix of detection technologies BUT without reaching a security detection panopticon nirvana of “all security telemetry, all the time.” Similarly, how do we avoid detection gaps that really matter?

Next, let’s dispatch with some popular wrong solutions: a popular EDR vendor once tried to convince me that “because their EDR is oh-so-sooooooo good, you really just need an EDR, and you are ‘all covered’.” Essentially, their view was that EDR is the final answer to our detection and response challenges, an answers for the states 1, 2 and 3 above. Now, I love the endpoint visibility provided by a good EDR tool, but routers? IoT devices? Personal PCs and mobile devices? EDR and SIEM / logs work well together, representing two (of the three) sides of visibility.

Similarly, “just capture traffic” because “traffic is the truth” is equally silly. Again, I love network visibility, but SSL / TLS? Highly distributed networks? Attacker-provided “rogue” GSM devices? Cloud environments with no traffic capture? The pattern is obvious: ONE LAYER is not the answer. Because it won’t be detection in depth then, will it?

Now, lets get back on track: what are some of the ideas for solving this:

  1. Good old killchain teaches us to try to try to capture telemetry down the likely attacker path (e.g from client PCs to DMZ to cloud environments) and wherever a stage of a killchain may happen. Killchain makes logical sense when considering operations from an attacker point of view. But mapping it to a defender’s capabilities and visibility layers is not that straightforward.
  2. Ever-more-popular ATT&CK framework further gives you a list of “stuff” that you then need to “cover” by one or more source of telemetry (admittedly, very skewed to endpoint data sources i.e. a place where the attacker ends up). However, you then need to work to decide on the layers to deploy on your own.
  3. Ad hoc: perhaps I want to get 2–3 chances to detect every threat in my environment (the specific threat list would come from some threat assessment thinking). What layers do I need for that?

At this point, I feel that this post will be more of “an incomplete thought” post, and just like an elephant in an old Russian joke, it will end suddenly… But yes, this is what I have so far. Thoughts?

P.S. My next couple of blog posts will be about some new stuff we are releasing, BTW…

Detection Coverage and Detection-in-Depth was originally published in Anton on Security on Medium, where people are continuing the conversation by highlighting and responding to this story.

*** This is a Security Bloggers Network syndicated blog from Stories by Anton Chuvakin on Medium authored by Anton Chuvakin. Read the original post at:

Cloud Workload Resilience PulseMeter

Step 1 of 8

How do you define cloud resiliency for cloud workloads? (Select 3)(Required)