Identifying Anonymous Programmers by Coding Style

It would be tremendously beneficial for people who work in malware forensics to have better methodologies for determining the human authors of otherwise anonymous code. For example, SamSam ransomware has been devastating hospitals and entire city networks. It’s now believed that there’s just one malware author behind the attacks. Wouldn’t it be great if we could identify that individual?

Computer science professors Rachel Greenstadt and Aylin Caliskan presented their methodology for identifying programmers by patterns in their use of code at this year’s DEFCON. From the abstract:

“Many hackers like to contribute code, binaries, and exploits under pseudonyms, but how anonymous are these contributions really? Our work on programmer de-anonymization from the standpoint of machine learning…. show(s) how abstract syntax trees contain stylistic fingerprints and how these can be used to potentially identify programmers from code and binaries. We perform programmer de-anonymization using both obfuscated binaries, and real-world code found in single-author GitHub repositories and the leaked Nulled.IO hacker forum.”

This reminds me of idiolects and the field of forensic linguistics. It’s believed that the study of forensic linguistics began as far back as 1927. The Associated Press wrote about some insight into determining the author of a ransom note.

An idiolect is the distinctive way an individual uses language. It’s important for forensic linguists to determine what someone’s idiolect is. An idiolect is much more specific than a dialect.

People who know me personally may have noticed that I love to finish sentences with “quite frankly!” I also enjoy describing things with two or three synonym adjectives. When I do that, it’s obnoxious, unpleasant, and annoying. To top it all off, who do you ever hear saying “mayn’t?” I say it often, but there may not be any others.

Many of the ideas behind (Read more...)

*** This is a Security Bloggers Network syndicated blog from Cylance Blog authored by Kim Crawley. Read the original post at: https://threatvector.cylance.com/en_us/home/identifying-anonymous-programmers-by-coding-style.html