Behavior-Based Detection Can Stop Exotic Malware

by Morgan Demboski on September 16, 2021

To stay a step ahead of cyber defenders, malware authors are using “exotic” programming languages—such as Go (Golang), Rust, Nim and Dlang—to evade detection and impede reverse engineering efforts.

Unconventional languages are composed of more complex and convoluted binaries that are harder to decipher than traditional languages like C# or C++. This entices both APTs (advanced persistent threats) and cybercriminals to use unusual languages in their malware sets to add complexity, avoid discovery and target multiple platforms.

Examples of malware written in these languages include:

Rust

Nim

Dlang

Zebrocy

WellMess

ElectroRAT

Robbinhood

NanoCore Dropper

RustyBuer

Convuster Adware

PyOxidizer

Nim-based Cobalt Strike loaders

NimzaLoader

DeroHe

Zebrocy

Vovalex

RemcosRat

OutCrypt

DShell

(Data compiled from BlackBerry Research and Intelligence Team)

How Unconventional Languages are Used

There are two ways we see exotic languages worked into malware.

Some malware authors are simply “wrapping” commodity malware in loaders and droppers written in uncommon languages to obfuscate the first stage of the infection process and bypass existing security controls that detect more common forms of malicious code. In contrast, other malware developers are fully rewriting the code of existing malware sets to create new variants. For example, the RustyBuer malware variant discovered this year is a new variation of the Buer malware loader.

And some threat actors are using both of these strategies. APT28 leverages a multi-language kill chain and has repeatedly employed unusual languages in its development process. Originally written in Delphi, APT28 rewrote the Zebrocy backdoor binary in Go, and then in 2019 it rewrote its downloader in Nim. APT28 still uses the same initial compromise vector and many of the same tactics, indicating it is likely easier for threat actors to port original malware code to other languages rather than changing their tactics, techniques and procedures (TTPs) to evade detection. Because TTPs are really just adversarial “behaviors,” they’re harder for an attacker to change and are therefore the best type of indicators for defenders to zero in on.

Since these programming languages are relatively new, they add additional layers of obfuscation that bypass conventional security measures. It is rarer to see malware written in these languages; as a result, reverse engineers are not as familiar with their implementation, and malware analysis tools and sandboxes have a difficult time analyzing samples of them.

Malware rewrites break the static signatures created for well-known malware families, and since there’s no identifiable signature, malware written in uncommon languages often goes undetected by antivirus (AV) software.

AV products and EDR solutions have an extensive history of scanning and sandboxing C-language executables with pre-existing lists of detections to look for. When these security tools encounter a language they do not recognize, they will often give it a pass and the malicious activity will not be flagged, simply because of a lack of heuristics for known malicious actions.

Go is the New ‘go-to’ for Attackers

Before 2019, malware written in Go was rarely seen. However, over the past two years, Go-based malware strains have become so popular that some argue it is no longer an exotic language. Droves of malware developers are choosing to write their malware scripts in Go because it is not only a simple, reliable, and efficient language, but also has large binaries that often evade AV and EDR detection.

Since Go binaries are often statically linked—meaning that all necessary libraries are included in the compiled binary—the size of a Go binary can be very large relative to its counterparts (just a simple “HelloWorld” weighs in at a whopping 2 MB), making it a great language to cross-compile to target multiple operating systems. Go’s large binary size causes analysis issues for some AV vendors since several security products struggle to handle larger files and have been known to just stop scanning and pass a binary if it is above a specific size.

Even more beneficial for malware authors is the difficulty Go adds to performing malware analysis and reverse engineering on a suspect binary.

Reverse engineers that rely on disassemblers like Ghidra or IDA-pro have a very hard time investigating Go binaries due to some of the language’s features, like unrecognized strings in the binaries. Since the plugins and analysts’ experience are usually driven from traditional malware languages like C++/C, the wealth of community plugins and contributions provided by these tools to find malicious code blocks are slowed down, making analysis more cumbersome.

Over the past year, Go-based malware has hit systems on a nearly regular basis, and the continuous growth of malware written in Go has motivated defenders to shift their focus to developing tools and scripts that can detect and analyze its binaries. However, as is usual with detection development, the first implementations have been focused on signatures or other heuristic techniques.

Since signature-based detection depends on particular static characteristics within a file, it is essentially useless when encountering unknown malware variants. When malware is rewritten in a new language, the old signatures created to detect the previous version will likely not match, and new signatures will then have to be created to detect them.

The signature-based approach will forever be a “cat-and-mouse game” in which attackers make small tweaks to their processes for quick and easy wins. Because coding concepts for malware often remain the same, it is not as much of a stretch for malware developers to adopt new languages, making signature-based detection an ineffective security solution in these cases.

The Need for Behavior-Based Network Detection

In cases where malware families have been rewritten in unconventional languages, using dynamic or behavioral signatures that track behavior via log data or sandbox output is far more reliable. Though recoded malware can break static signature-based detections, the action and behavior of the malware itself often stays the same; therefore, network-based behavioral analytics remain effective regardless of the language the malware is written in.

Defenders need to take a step back from the implementation and focus on the core concept of how these malware pieces interact with the system itself in order to effectively tag dynamic behavior if and when signatures fail.

Regardless of the language a piece of malware is written in, once it infects a host machine, it generally establishes communication with an external server to receive instructions, download additional payloads or exfiltrate information. As a result, defenders can monitor host activity as well as the type and quantity of traffic entering and exiting the network to detect malicious behavior by unknown malware variants.

To be clear, signature-based detections have their place; however, the trend of new languages being used for malware development is not going away and will likely increase. This gives attackers a window of opportunity to go completely undetected if the only line of defense is standard AV signature-based detections. Defenders must continue to adopt and integrate behavioral detection on endpoints as well as the network to ensure full coverage.

The use of exotic malware is rising, and the use of signature-based solutions like AV and EDR is already known to be ineffective in detecting these threats. Researchers and organizations need to stay on top of this emerging trend and the proliferation of malware written in what were once considered “rare” languages to avoid being flooded with new threats they are unable to detect and mitigate.

Focusing on the improvement and deployment of behavior- and/or network-based detection analytics will help the cybersecurity community remain proactive in defending against the malicious use of unusual programming languages.