Home » Cybersecurity » DevOps » Trail of Bits @ ICSE 2019 – Recap

Trail of Bits @ ICSE 2019 – Recap

by josselinfeist on June 19, 2019

Three weeks ago, we presented our work on Slither at WETSEB, an ICSE workshop. ICSE is a top-tier academic conference, focusing on software engineering. This edition of the event went very well. The organizers do their best to attract and engage industrials to the discussions. The conference had many talks in parallel. We wish we could have attended several concurrent talks. The following lists some of the talks we recommend:

Note: Some of the following papers are only accessible with a paid account. We will do our best to update the links as soon as the papers become freely available.

Static Analysis

At Trail of Bits, we spend a lot of effort building reliable static analyzers. For example, McSema allows us to lift binary directly to LLVM. Slither leverages static analysis to smart contracts. ICSE offered a rich variety of talks on this topic. We noticed two interesting trends to improve the scalability of static analysis: the combination of light and expensive analyses, and the online tuning of parameters.

SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer.

By Mikhail R. Gadelha, Enrico Steffinlongo, Lucas C. Cordeiro, Bernd Fischer, Denis A. Nicole (pdf)

This work on clang highlighted the scalability of static analysis. The authors worked to remove false alarms emitted by the clang static analyzer through the use of an SMT solver. The solver is meant to filter out simple false alarms that were detected by the existing static analyzer. While the technique is not novel by itself, it is nice to see its concrete implementation within the compiler. The technique can be used out of the box, without significant overhead.

Resource aware program analysis via online abstract corning.

By Kihong Heo, Hakjoo Oh, Hongseok Yang (pdf)

In this work, the authors tried to tune the parameters of the analysis on-the-fly according to available resources (e.g. the free RAM). The technique enables or disables the flow-sensitivity of the variables. First, they rank each variable with a score representing whether flow-sensitivity is important. Then, a controller decides how many variables should be treated flow-sensitively. If a fix-point is not reached, the controller can change the number of variables treated as flow-sensitive on the fly. This work is a nice step towards adaptive static analyzers that will be able to adjust themselves in a real-world context.

SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code

By Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, Charles Zhang (pdf)

SMOKE tries to tackle to problem of finding memory leaks in large codebases. The intuition behind this work is that most of the memory objects can be proven safe from leaking without a complex analysis, while only a small number of the objects require costly analysis. Their approach is twofold: first, they use the so-called use-flow graph representation to compute an imprecise but fast analysis to filter out most of the leak candidates. Then, they use a precise and costly analysis on the remaining objects. It is interesting to note that they first use a linear-time solver to filter out obvious results, and apply an SMT solver only on remaining cases. SMOKE is built on top of LLVM. The tool seems to scale to projects up to 8 million lines of code. The tool and the data to reproduce the experiment are available (the source code is not).

Testing

We are always interested in state-of-the-art testing techniques to improve input generation, either using fuzzing or symbolic execution. For instance, we recently added new techniques for concolic execution of cryptographic primitives and symbolic path merging to Manticore, our symbolic execution engine. This year we highlight the following papers:

Generating Random Structurally Rich Algebraic Data Type Values (AST)

By Agustín Mista, Alejandro Russo (pdf)

The authors designed an approach for the generation of algebraic data types values in Haskell. The generated values are more “uniform” in terms of how often each value constructor is randomly generated (and having variety is important to uncover more bugs). After generation, these values can be used to test other programs (not necessarily in Haskell). The authors’ system works at compile time to generate Haskell code that generates values. Source code is available here.

DifFuzz: Differential Fuzzing for Side-Channel Analysis

By Shirin Nilizadeh, Yannic Noller, Corina S. Pasareanu (pdf)

DiffFuzz uses differential fuzzing to detect side-channels. The idea is, instead of maximizing the code coverage, to maximize the difference of resources used (i.e. the time or the consumed memory) between two inputs. The inputs are meant to be composed of the same public part and a different private part. If the difference of resources used is above a given threshold, the attacker can deduce information about the private part sent. The paper presents a fair evaluation using specific benchmarks. The code is available here.

SLF: Fuzzing without Valid Seed Inputs

By Wei You, Xuwei Liu, Shiqing Ma, David Mitchel Perry, Xiangyu Zhang, Bin Liang (pdf)

An interesting paper that shows a technique to improve fuzzing when no seeds or source-code are available. SLF uses a complex technique, divided into multiple steps. To start, SLF uses AFL to identify fields in the input data. Then, it uses a lightweight dynamic analysis to determine which fields are verified by the execution (e.g., while parsing a file) and what type of checks are used: arithmetic, index/offset, count, or if-then-else. Moreover, SLF identifies when checks depend on each other. Finally, this tool implements an array of techniques to generate and mutate input for every field, depending on its type.

The experimental evaluation is fair and shows good results in terms of finding new bugs and improving coverage in testing complex programs. However, SLF is not open source, so we cannot verify the results presented in this paper.

Grey-box Concolic Testing on Binary Code

By Jaeseung Choi, Joonun Jang, Choongwoo Han, Sang Kil Cha (pdf)

This paper presents Eclipser, a fuzzer that borrows some ideas from symbolic execution, but keeps them scalable for use in larger and more complex programs. We enjoyed their ideas so much that we described their paper in detail in a recent blog post, and integrated Eclipser in DeepState, our Google-Test-like property-based testing tool for C and C++.

Blockchain

Blockchain in academia remains a hot topic. Several new conferences specialize in this area. ICSE did not escape the fervor. Several papers related to blockchain were presented. While quality varied, we found the following works promising:

Smarter smart contract development tools (WETSEB)

By Michael Coblenz, Joshua Sunshine, Jonathan Aldrich, Brad A. Myers (pdf)

This talk presented Obsidian, a new smart contract language designed to be safer than existing languages (e.g. Solidity). Obsidian has interesting properties, including a kind of user-level pre- and post-condition type system. Obsidian tries to statically prove the conditions, and adds dynamic checks when needed. The current implementation only compiles to Hyperledger Fabric. The language is still young, though promising. The authors are running a user study to improve the language design.

Gigahorse: Thorough, Declarative Decompilation of Smart Contracts

By Neville Grech, Lexi Brent, Bernhard Scholz, Yannis Smaragdakis (pdf)

Gigahorse is an EVM decompiler. The authors use Datalog, a declarative logic programming language in an unexpected way: they wrote the decompilation steps as Datalog rules and combined them with an external fixpoint loop to overcome the language limitations. A web service is available at https://contract-library.com, though source code is not provided.

Automated Repair

Automatic bug patching is an interesting, but complex, topic. We tackled this challenge during the CGC competition, and we have preliminary results for smart contracts through slither-format. We were definitely interested to review the academics trends in this area of research. Several papers showed promising work, with the caveat that they generally focus on one type of issue, and some evaluations generated incorrect patches.

SapFix: Automated End-to-End Repair at Scale

By A. Marginean, J. Bader, S. Chandra, M. Harman, Y. Jia, K. Mao, A. Mols, A. Scott (pdf)

SapFix is an automated end-to-end fault-fixing tool deployed at Facebook, designed to work at scale. The system focused on null pointers, and takes advantage of two other tools, Sapienz and Infer. The work showed an interesting combination of heuristics and templates to create the least painful experience for the user. For example, the system will combine information from a dynamic crash with the Infer static analyzer to improve the fault localization. It will abandon the patch if no developer reviews it within seven days. The paper presented promising results.

On Reliability of Patch Correctness Assessment

By Xuan-Bach D. Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, and Corina Pasareanu (pdf)

This work assesses the validity of patch-generation tools. This type of validation is, unfortunately, not represented enough in conferences. The authors performed an evaluation of eight automatic software repair tools with 35 developers. The paper shows that the results of the tools are not as promising as claimed, though they are still useful as complements to better established tools.

Poster Session

The poster session was meant to present on-going work and allowed direct interaction with the authors. We found several promising works.

Demand-driven refinement of points-to analysis

By Chenguang Sun, Samuel Midkiff (pdf)

This work follows the trend we saw during the static analysis session to aid scalability. The goal is to improve points-to analysis by slicing pertinent program elements that are needed to answer targeted queries.

WOK: statical program slicing in production

By Bogdan-Alexandru Stoica, Swarup K. Sahoo, James Larus, Vikram Adve (pdf)

The authors are working toward a scalable dynamic slicing of programs, by taking advantage of dataflow information gathering statically, and modern hardware support (e.g., Intel Processor Trace). Their preliminary evaluation shows real potential for the technique.

Validity fuzzing and parametric generators for effective random testing

By Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, Yves Le Traon (pdf)

This ongoing work tries to improve generator-based testing (aka QuickCheck) by guiding the input generation with two features: (1) the use of a validator to discard semantically invalid inputs, and (2) the conversion from raw bits input to structured inputs. The goal is to be able to conserve both the syntax and semantics of the inputs, especially structured inputs (i.e., file format).

Optimizing seed inputs in fuzzing with machine learning

By Liang Cheng, Yang Zhang, Yi Zhang, Chen Wu, Zhangtan Li, Yu Fu, Haisheng Li (pdf)

The authors aim to improve the generation of inputs through machine learning following the same techniques from Learn&Fuzz. The insight is to let a neural network learn the correlation between the input and the execution trace coverage, with the goal of generating inputs that are more likely to explore unseen code. Their first experiment on PDF shows encouraging results when compared with previous work.

Contributing to Academic Research

In our work, we spend a fair amount of our time building reliable software, based on the latest research available (see Manticore, McSema, Deepstate, or our blockchain toolchains). We enjoy exchanging our vision for technology with the academic community. We are also happy to provide any technical support for the usage of our tools for academic research. If your work brings you to try our tools, contact us for support!