Of the nearly 200 papers on software fuzzing that have been published in the last three years, most of them—even some from high-impact conferences—are academic clamor. Fuzzing research suffers from inconsistent and subjective benchmarks, which keeps this potent field in a state of arrested development. We’d like to help explain why this has happened and offer some guidance for how to consume fuzzing publications.
Researchers play a high-stakes game in their pursuit of building the next generation of fuzzing tools. A major breakthrough can render obsolete what was once the state of the art. Nobody is eager to use the world’s second-greatest fuzzer. As a result, researchers must somehow demonstrate how their work surpasses the state of the art in finding bugs.
The problem is trying to objectively test the efficacy of fuzzers. There isn’t a set of universally accepted benchmarks that is statistically rigorous, reliable, and reproducible. Inconsistent fuzzing measurements persist throughout the literature and prevent meaningful meta-analysis. That was the motivation behind the paper “Evaluating Fuzz Testing,” to be presented by Andrew Ruef at the 2018 SIGSAC Conference on Computer and Communications Security in Toronto.
“Evaluating Fuzz Testing” offers a comprehensive set of best practices for constructing a dependable frame of reference for comparing fuzzing tools. Whether you’re submitting your fuzzing research for publication, peer-reviewing others’ submissions, or trying to decide which tool to use in practice, the recommendations from Ruef and his colleagues establish an objective lens for evaluating fuzzers. In case you don’t have time to read the whole paper, we’re summarizing the criteria we recommend you use when evaluating the performance claims in fuzzing research.
Despite how obvious or simple these recommendations may seem, the authors reviewed 32 high-quality publications on fuzzing and did not find a single paper that aligned with all 10. Then they demonstrated how conclusive the results from rigorous and objective experiments can be by using AFLFast and AFL as a case study. They determined that, “Ultimately, while AFLFast found many more ‘unique’ crashing inputs than AFL, it only had a slightly higher likelihood of finding more unique bugs in a given run.”
The authors’ results and conclusions showed decisively that in order to advance the science of software fuzzing, researchers must strive for disciplined statistical measurements and better empirical measurements. We believe this paper will begin a new chapter in fuzzing research by providing computer scientists with an excellent set of standards for designing, evaluating, and reporting software fuzzing experiments in the future.
In the meantime, if you’re evaluating a fuzzer for your work, approach with caution and this checklist.
*** This is a Security Bloggers Network syndicated blog from Trail of Bits Blog authored by Trent Brunson. Read the original post at: https://blog.trailofbits.com/2018/10/05/how-to-spot-good-fuzzing-research/
Leading UK Credit Card Consumer Finance Company Uses Advanced Graph Analytics to Intercept Fraudulent Credit Card Applications, Boost Anti-Fraud Efforts…
Digital+ Partners Leads Continuation Funding Round in Growing Automated Threat Analysis & Detection Provider, Closing its Series B Round at…
For three years OpenWRT had a severe validation problem with its download package manager, until a fuzz tester found and…
It’s time to say a final “Goodbye” to Flash. (Or should that be “Good riddance”?) With earlier this week seeing…
1. Be a student of (information security, network security, cyber security). Always strive to know what the latest tactics, trends,…
This is the second in a series of blog posts that discuss how smart DNS resolvers can enhance ongoing network…