Analytics & Intelligence Security Bloggers Network

Home » Cybersecurity » Analytics & Intelligence » The Need For Domain Experts and Non Trivial Conclusions

The Need For Domain Experts and Non Trivial Conclusions

by Raffael Marty on August 2, 2019

In my last blog post I highlighted some challenges with a research approach from a paper that was published at IEEE S&P, the sub conference on “Deep Learning and Security Workshop (DLS 2019)“. The same conference featured another paper that spiked my interest: Exploring Adversarial Examples in Malware Detection.

This paper highlights the problem of needing domain experts to build machine learning approaches for security. You cannot rely on pure data scientists without a solid security background or at least a very solid understanding of the domain, to build solutions. What a breath of fresh air. I hole heartedly agree with this. But let’s look at how the authors went about their work.

The example that is used in the paper is in the area of malware detection; a problem that is a couple of decades old. The authors looked at binaries as byte streams and initially argued that we might be able to get away without feature engineering by just feeding the byte sequences into a deep learning classifier – which is one of the premises of deep learning, not having to define features for it to operate. The authors then looked at some adversarial scenarios that would circumvent their approach. (Side bar: I wish Cylance had read this paper a couple years ago). The paper goes through some ROC curves and arguments to end up with some lessons learned:

Training sets matter when testing robustness against adversarial examples
Architectural decisions should consider effects of adversarial examples
Semantics is important for improving effectiveness [meaning that instead of just pushing a binary stream into the deep learner, carefully crafting features is going to increase the efficacy of the algorithm]

Please tell me which of these three are non obvious? I don’t know that we can set the bar any lower for security data science.

I want to specifically highlight the last point. You might argue that’s the one statement that’s not obvious. The authors basically found that, instead of feeding simple byte sequences into a classifier, there is a lift in precision if you feed additional, higher-level features. Anyone who has looked at byte code before or knows a little about assembly should know that you can achieve the same program flow in many ways. We must stop comparing security problems to image or speech recognition. Binary files, executables, are not independent sequences of bytes. There is program flow, different ‘segments’, dynamic changes, etc.

We should look to other disciplines (like image recognition) for inspiration, but we need different approaches in security. Get inspiration from other fields, but understand the nuances and differences in cyber security. We need to add security experts to our data science teams!

The Need For Domain Experts and Non Trivial Conclusions

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Fortinet® Follies