Detect Threats by Modeling Application Protocol Behaviors

by Fidelis Cybersecurity Blogs on September 22, 2020

This is the second blog in our series on how Fidelis Network Detection and Response (NDR) uses a combination of machine learning and advanced analytics to detect threats on an enterprise network. In our first blog, I introduced our 5-by-5 Anomaly Matrix consisting of five different contexts. These included External, Internal, Application Protocols, Data Movement, and Events. Along with those, I also discussed five different model families for each context for detecting different types of threats. I provided details for the anomaly models for External and Internal traffic contexts.

In this blog, I focus on the anomaly detection for Application Protocols. The five model families detect suspicious activities in encrypted traffic, i.e. Transport Layer Security (TLS), Domain Name Service (DNS), Web traffic (HTTP), Email (SMTP), and the rest of the protocols are grouped together as “Other.” Table 1 at the end of this blog lists these models.

Anomalies in Encrypted Traffic

The “handshake” between a client and a server that precedes encrypted communication contains rich metadata that can be used to detect an infected host on a network.

tls handshake

Figure 1: The TLS handshake

Figure 1 shows the different steps during the TLS handshake. The information in “client hello” and “server hello” messages can be used to create client and server fingerprints using JA3[1]. Fidelis NDR creates and tracks the client (JA3/JA3 Digest) and server (JA3S/JA3S Digest) fingerprints seen in enterprise traffic. The anomaly models detect New and Rare Client and Server fingerprints seen on assets owned by an enterprise.

A new or rarely used TLS fingerprint indicates presence of a new or rarely used tool on an asset. Such a tool might have been installed by a malware. Hence, detecting oddball or suspicious TLS fingerprints using anomaly models can be an effective signatureless way to identify infected assets.

Robust Detections

In a large enterprise with thousands of assets, simply tracking new and rare TLS fingerprints can result in lots of false positive detections. Fidelis NDR analyzes the behavior of assets with suspicious TLS fingerprints along other dimensions. It uses a Supervised Machine Learning based algorithm to detect if the SSL Server Certificate used in a TLS handshake is suspicious. For example, the attributes of the certificate are like SSL Certificates used by servers controlled by malicious actors. A new or rare TLS tool participating in a handshake involving a suspicious SSL Server Certificate is an anomaly that warrants further investigation. This is because it might be a Command and Control (C2) threat activity.

Fidelis’ multiple context approach for NDR also enables analysts to correlate TLS protocol anomalies with anomalies in a different context. For example, a new or rare TLS fingerprint used to communicate with a new destination country or rare web domain (anomalies in our External traffic context) is another example of a high confidence detection indicating presence of Command and Control (C2) threat activity.

Anomalies in Domain Name Service (DNS) Traffic

There are two prominent threat activities hiding in DNS traffic. The first is the use of DGA (Domain Generation Algorithms [2]. The second is DNS tunneling by malware to bypass firewalls and communicate with external servers controlled by an adversary.

1) Domain Generation Algorithms

Fidelis NDR uses two approaches for detecting DGA. Our Supervised Machine Learning based Classifier analyzes the domain names observed with DNS and Web protocols to flag ones that are likely to be generated by an algorithm. However, algorithmically generated domain names are also used by legitimate services like Content Distribution Network (CDNs), Online ad-networks, and Cloud Service providers. To distinguish malicious DGA activity from these legitimate services, Fidelis NDR models the prevalence of DNS request failures, i.e. NXDOMAIN responses, for different types of assets in an enterprise. An increase in NXDOMAIN response rate for an asset, particularly for algorithmically generated domain names, is a strong indicator of DGA activity due to malware.

2) DNS Tunneling

DNS tunneling is a non-standard use of DNS protocol to exchange data via the domain name itself. The DNS protocol allows a full domain name to contain up to 253 characters in textual representation. Attackers exploit this specification to exfiltrate data about an asset (e.g. hostname, IP address, username, etc.) by encoding it as long subdomain names. The MITRE ATT&CK framework identifies DNS Tunneling as a sub-technique used for Command and Control [3].

Fidelis NDR learns baseline models for the prevalence of long subdomain names and number of unique subdomains for a domain. It also uses statistical anomaly detection algorithms to flag significant deviations from these baseline models. To achieve high-confidence detections, it learns different baselines for each asset type. It then assigns a higher score to anomalies associated with rare domains, i.e. domains that are used by a few assets.

Anomalies in Web Traffic

Enterprises often host public-facing web servers that are vulnerable to exploitation due to software bugs or unpatched vulnerabilities (MITRE ATT&CK Technique T1190) [4]. Fidelis NDR flags new invalid URL errors observed on enterprise web servers open to the Internet as anomalies.

Web servers – internal as well as public-facing – are vulnerable to file and directory discovery attack (MITRE ATT&CK Technique T1083) [5]. Often, such attempts show up as an increase in HTTP Response Status codes in the 400 range (e.g. 404 = file not found). Fidelis NDR learns the baseline rate for HTTP Status codes and flags a statistically significant increase as an anomaly.

Anomalies in Email Activity

Phishing attempts via e-mail is a prevalent Initial Access attack (MITRE ATT&CK technique T1566) [6]. Often, phishing emails spoof the sender information. To detect phishing emails, Fidelis NDR first identifies mismatches between “From” and “Reply-to” fields in emails. Next, it correlates them against the prevalence of the sender’s account in the overall email activity to identify mismatches associated with rare accounts. These are flagged as potential phishing attempts. This anomaly model complements the Fidelis Sandbox that inspects URLs in emails using detonation to detect emails with links to malicious websites (another common property of phishing emails).

Another threat model that uses email is Internal Spear phishing (MITRE ATT&CK technique T1534) [7]. After compromising a trusted internal email account, either by installing malware on a device or by using stolen credentials, adversaries send out phishing emails to other users. Fidelis NDR learns the baseline behavior for a number of recipients for internal emails and flags accounts sending out emails with abnormally high numbers of internal recipients as suspicious.

Other Protocol Anomalies

In addition to the different anomalies for the four application protocols, Fidelis NDR also flags the use of new protocol(s) by an asset as an anomaly. Such events include both application protocols and non-application protocols like UDP, ICMP, etc. These could be indicative of attempts at Lateral Movement or Command and Control activity (MITRE ATT&CK technique T1095) [8].

Conclusion

In this blog, I described the Fidelis NDR anomaly models targeted at detecting suspicious activities. These activities related to Command and Control, adversaries exploiting services open to the Internet to gain access to an enterprise network, file and directory discovery by adversaries using infected hosts and phishing attacks. These models learn the baseline behavior for different applications protocols, including TLS, DNS, HTTP, and SMTP, in an enterprise and flag significant deviations from the expected behavior as anomalies. I also discussed the need to correlate anomalies against other data-driven indicators, e.g. prevalence of domain or sender account, to avoid false positives. Our next blog with describe anomaly models for the DLP (Data Leakage Prevention) use cases. Subscribe to our Threat Geek blog to get the next one in the series sent to you!

Table 1: Detect Threats

Potential Threat	Protocol	Behavioral Footprint	Anomaly Model	MITRE ATT&CK
Infected Host, C & C	TLS	Use of new TLS tools (new JA3)	Baseline model learns the set of TLS tools used by different types of assets.	TA0011
Infected Host, C & C	TLS	Use of rare TLS tools (rare JA3)	Baseline model learns the set of TLS tools used by majority of assets for a type.	TA0011
Infected Host, C & C	TLS	Use of suspicious Server Certificate	SSL Certificate classier detects suspicious certificates. Baseline model determines if only a small fraction of assets connect to this server.	TA0011
DGA	DNS	High NXDOMAIN Response Rate	Baseline models learn the normal level NXDOMAIN Response rate for different assets and servers.	T1071
Tunneling	DNS	Rare domains with high unique subdomains count	Baseline model learns the distribution of subdomain counts across different domains.	T1071.004
Tunneling	DNS	Rare domains with long subdomain names	Baseline model learns the distribution of subdomain lengths across different domains.	T1071.004
Server Vulnerability	HTTP	Invalid URL errors on web servers open to the Internet	Flag New URLs associated with an invalid URL error.	T1190
Directory Scan	HTTP	High of 4xx HTTP Response Status codes	Baseline models learn the normal level 4xx Response rate.	T1083
Spear phishing	SMTP	“From” and “Reply-to” mismatch	Baseline model learns the pattern of email traffic from external accounts and identifies rare senders.	T1566
Account Manipulation, Internal phishing	SMTP	High email recipients count	Baseline model learns the pattern of internal email traffic.	T1534
Lateral Movement	Others	Use of a new protocol	Baseline model learns the presence and prevalence of different protocols.	T1095