SBN

Running iob-cache in Mayhem

Running iob-cache in Mayhem

David Brumley

·

May 05, 2022

Verilog is the de facto tool used in chip design. Once you create your verilog design, though, you need to extensively test it before you burn the design to silicon.  Unfortunately, before today companies needed to manually author test vectors, which is expensive and slow.  ForAllSecure’s Mayhem is the first product that can help automate authoring new Verilog test vectors, significantly reducing human effort and speeding up time to market.

As you read through this, you can also follow along yourself using the code repo on Github. We’ve forked a popular Verilog cache called iob-cache, and created a branch with a walkthrough of making Mayhem create a test suite.  `iob-cache` was actually given to us as a “challenge problem” from Micron.  Micron created the challenge by inserting new corner cases into iob-cache that had no test vector, and then asked whether Mayhem could discover and create test vectors for the new states.

This post is divided into three sections:

  1. A quick start, which will allow you to reproduce our results using an existing Docker image. Read this if you want a point-and-click try-it-yourself.
  2. A guide to building from scratch, including creating a docker image and pushing to a registry. This shows you how to fully reproduce our build.
  3. A more comprehensive walk-through discussing the particulars of getting the Verilog running in Mayhem. This walks you through our overall approach to making Verilog run on Mayhem, and some tips and tricks like integer splitting.

Quick Start

We’ve set up a mayhem branch on our github iob-cache fork that has everything ready to go to analyze with Mayhem.

If you want to follow along live, you’ll need a Mayhem account. You can apply for a free Mayhem account at https://forallsecure.com/mayhem-free

A quick-start using our existing docker image of the fork is:

  1. Log into your Mayhem server.
  2. Click the “+” symbol in the upper left side of the screen.
  3. Choose “From Docker Hub” tab.
  4. Enter “dbrumley/iob-cache”. See below if you’d like to build the image yourself from scratch.
  5. In the “Edit Mayhemfile”, in the bottom you will see green text on black background. Append “@@” to the end of the line, so the full line reads /iob-cache/hardware/simulation/verilator/obj_dir/Viob_cache @@. The @@ says the program will take an input file passed as argv[1].
  6. Click start run

Mayhem will immediately start exploring the program, and will generate test vectors that increase coverage. It took approximately 6 hours to generate the first test vector that hits a challenge corner case.

Building from scratch with our Repo

How does Mayhem work on Verilog?  Let’s walk through building the docker image from the quick start above. 

Pre-requisites:

  • You have a UNIX command line with git, docker, and the Mayhem CLI installed.
  • You’ve already logged into your Mayhem server.
  • We refer to your Mayhem username as .

First, check out recursively our fork of iob-cache with our mayhem branch. Nothing will build if you do not clone recursively!

# Remember to use --recursive! 
git clone --recursive [email protected]:dbrumley/iob-cache.git 

# Switch to the branch with all the code necessary to test 
# in Mayhem git checkout mayhem
git checkout mayhem

Second, build the docker image tagged with the full path to your Mayhem docker registry, and push to the registry:=

# Free users will need to tag and push to a public repository, such as 
# Dockerhub or Github's ghcr.io registry. Commercial users can use the Mayhem private registry.
docker build -t $(mayhem docker-registry)/<username>/iob-cache docker push $(mayhem docker-registry)/<username>/iob-cache

Third, edit the Mayhemfile baseimage variable to point to the full URL for the docker image you just pushed:

# Put the full path to your docker image. 
# $MAYHEM_DOCKER_REGISTRY points to your currently logged in Mayhem registry
# Change to the full path if using a public repository.

baseimage: $MAYHEM_DOCKER_REGISTRY/<username>/iob-cache:latest

At this point, you can either use the UI above to start a Mayhem run, or launch it on the CLI from the root of the iob-cache git repo with:

mayhem run .

Mayhem automatically starts getting code coverage.  Micron gave us an interesting test vector that is only triggered on specific input sequences.  Mayhem will find the first solution to the challenge in 5-6 hours, depending on how many cores you have available on your Mayhem server.

If you’re interested, the test vectors that solve the challenge are in the build below, and called answer-cafecafe, answer-deadbeef, answer-f1e2d3c4, and answer-01020304 To run them (e.g., to test your build):

# Run the docker container locally in interactive mode
docker run -ti $(mayhem docker-registry)/<username>/iob-cache bash
root@2ed81393d4ad:/iob-cache# /iob-cache/hardware/simulation/verilator/obj_dir/Viob_cache answer-01020304 

How did Mayhem do it?

The above showed you how to reproduce the results. But how would you do a new Verilog program?  In this section we go through tips and tricks for getting verilog running in Mayhem, using iob-cache as the running example.

The TL;DR is we:

  1. Cloned iob-cache from upstream.
  2. Apply the Micron patches to iob-cache to create the “Challenge” problem.  This is only needed for the Micron challenge; if you’re using Mayhem on your own Verilog it would not be needed.  We modified the existing testharness.cpp to read in test vectors from a file instead of running a constant test vector.
  3. Compile iob-cache with the integer-splitting optimization. While technically optional, in practice this is used because it speeds up analysis several-fold.
  4. Add assert statements to notify you when the challenges are solved. Mayhem continuously authors new tests that trigger new behaviors.  This is a quick hack to get notified when Mayhem gets coverage for the lines of interest.  Alternatively, you could just let Mayhem run and calculate coverage afterwords. Note: Mayhem is authoring test; it’s not a test-case runner like gcov. Mayhem is solving the hard part humans do today of writing new tests.
  5. Dockerized the app by writing a Dockerfile.
  6. Write a Mayhemfile to specify how Mayhem should perform analysis.
  7. (Optional): Create an initial (random) input test vector.
  8. Initiate a run.

You can see our build.sh which performs steps 2-4 on the code base.

Background: Integer Splitting Optimization

The challenge we were given uses very specific integer constants to trigger particular code branches. To speed up analysis, we will be building iob-cache using afl-clang to take advantage of an integer splitting optimization. This is purely an optimization, but does speed up time to find the corner cases.

Note that while we use integer splitting to generate the test suite, you can always replay the test suite under a compiled version without integer splitting if you are concerned about any changes in behavior. Therefore, you do not need to worry about integer split version used for generating tests to somehow not be faithful to the production code.

Integer splitting is an optimization to speed up finding inputs that satisfy comparisons. It works by turning a word-size comparison into multiple byte-length comparisons. For example, suppose the original code contains a word-size comparison with 0xabad1dea:

if(input == 0xabad1dea) {
  /* bug */
}

The probability of picking 0xabad1dea is 2^{-32}. Integer splitting rewrites the code to the semantically equivalent byte-length comparisons:

if(input >> 24 == 0xab)
  if(input & 0xff0000) >> 16 == 0xad)
     if(input & 0xff00) >> 8 == 0x1d)
        if(input & 0xff) == 0xea)
	  /* bug */

Integer splitting reduces the probability for heuristic-guided search to 4*2^{-8} = 2^{-10}, which is much less than 2^{-32}.

We use an LLVM pass that is easily integrated into the existing compile toolchain using the AFL drivers to add integer splitting.

Step 1: Clone the iob-cache repository

First, we cloned the iob-cache github repository. It’s important to make sure you clone recursively, else nothing will build properly!

# Check out recursively or nothing will work!
git clone --recursive [email protected]:IObundle/iob-cache.git

# Check out the mayhem branch to follow along.
git checkout mayhem

All changes to the master code base are kept as testbench.patch and verilog.patch. We included these to make it easy to see our changes. (We know in real development, you’d just check the changes directly into the git branch and not keep separate patch files. These are kept separately for illustrative purposes.) We will go through these changes, and files added, below.

Step 2: Apply the patches to iob-cache to create the challenge

Micron was interested in seeing what Mayhem could do, and tested us by creating a set of challenge states.  Micron made two modifications to `iob-cache` in:

  • hardware/src/front-end.v
  • hardware/src/iob-cache.v

The changes are summarized in verilog.patch. These patches are applied as part of build.sh, and can be manually applied with:

patch -p1 < verilog.patch

Mayhem creates test vectors as files, and we call the set of test vectors the “test suite” or “corpus”. Mayhem can automatically create test files for any program that reads from TCP, UDP, stdin, or a file. iob-cache, however, does not read in from a file by default.

We modified hardware/simulation/verilator/testbench.cpp to read in tests to check from a file. The changes are automatically applied in the build.sh script, or can manually be applied with:

patch -p1 < testbench.patch

Step 3: Compile iob-cache with integer splitting

We use AFL++ to enable the compiler pass for integer splitting and add some helpful instrumentation to the program binary. Mayhem itself runs a portfolio of dynamic analysis, and can work without this instrumentation. However, when source is available, we recommend using compile-in instrumentation because it will increase performance.

The iob-cache code base uses typical make environment variables to set the compiler and flags. We first set the environment variable AFL_LLVM_LAF_ALL, which directs the compiler to perform integer splitting, and then call make with the proper arguments:

make sim SIMULATOR=verilator CC=afl-clang CXX=afl-clang++ 
LINK=afl-clang ++ CFLAGS="-O2 -ggdb" CXXFLAGS="-O2 -ggdb" LDFLAGS="-O2 -ggdb"

Again, this is all automatically done in the build.sh script.

Step 4: Add assert statements

Mayhem automatically runs and grows code coverage.  It also has a pass to calculate line and assembly coverage. However, normally Mayhem will not tell you the moment it hits a line of code you care about.  However, you can easily make this happen by adding an `assert` statement. Mayhem will catch a failed `assert`, and tell you exactly when the line is reached.  This approach gave us a language-neutral way of supporting arbitrary user-defined cases.

Normally this is as easy as adding the `assert` to the code.  However, one complexity here is iob-cache takes a verilog file and generates C++ code with verilator. We’re not sure how to get verilator to generate a C `assert`, so we simply added them by hand.  This meant we also had to change the Makefile to call build.sh. We’ve included these changes in the repo, and document them here for transparency.

Step 5: Dockerize the App

Mayhem performs dynamic analysis, and we use Docker as a lightweight tool to give Mayhem the full runtime and configuration environment. Docker is extremely common in DevOps and DevSecOps pipelines. iob-cache, however, did not have a Dockerfile.

We wrote a small Dockerfile that builds the application and calls build.sh. Here are the contents:

FROM debian:bullseye-slim
RUN apt-get update && 
  apt-get install -y build-essential clang afl vim gdb libc6-dbg verilog 
  verilator less
COPY corpus /corpus
WORKDIR /iob-cache
COPY . .
RUN ./build.sh
CMD ["/iob-cache/hardware/simulation/verilator/obj_dir/Viob_cache"]

You can build the Docker image then with:

docker build -t iob-cache .

And then push to your registry. As described in the “quick build” section, the docker image can be recreated and pushed to a Mayhem registry with:

# mayhem docker-registry will return the name of your private Mayhem
# docker registry for commercial customers.
# 
# Free users will need to tag and push to a public repository, such as 
# Dockerhub or Github's ghcr.io registry.
docker build -t $(mayhem docker-registry)/<username>/iob-cache
docker push $(mayhem docker-registry)/<username>/iob-cache

Step 6: Write a Mayhemfile

Mayhem uses configuration-as-code with a Mayhemfile. If you performed the quick start above, the UI will automatically generate a Mayhemfile for you, which can then be checked into git.

We wrote a Mayhemfile to show how you can ingrate Mayhem from the UNIX command prompt, and thus how to integrate into any DevOps pipeline.

We show below the Mayhemfile, and what each statement means.

# The Mayhem version you are writing this configuration for.
version: '1.16'

# Change baseimage to be the same as your push URL above. 
# You can use $MAYHEM_DOCKER_REGISTRY as shorthand for the name of the 
# registry, e.g., $MAYHEM_DOCKER_REGISTRY/dbrumley/iob-cache:latest points
# to an image on your local registry.  If you forget the name, you can find it
# with the $(mayhem docker-registry) command.
baseimage: dbrumley/iob-cache:latest

# The project name in the UI. A common convention is the git project name
project: dbrumley/iob-cache

# A unique name in the UI for this particular target app.  A project
# (e.g., git repo) can have multiple targets, each specified in their
# own Mayhemfile
target: iob-cache

# Mayhem can perform a variety of tasks, such as regression testing,
# measuring code coverage, and so on.  This line says to only run
# the primary Mayhem analysis.
tasks:
  - name: behavior_testing
  
# The executable command to run within the Docker image. This is the executable
# to test.  The "@@" says the input file name is given as argv[1].
# Mayhem will generate a file, and substitute the filename every where it
# sees an "@@".
cmds:
  - cmd: /iob-cache/hardware/simulation/verilator/obj_dir/Viob_cache @@

Step 7: (optional) Create an initial (random) input test vector

Mayhem can start with your existing test suite and grow it. We call the initial set of test vectors the “seed set”. The initial seed set is placed in a corpus directory, which mayhem run (next step) will read.

For this challenge, we had no test cases, so we generated one randomly chosen seed (via dd if=/dev/random of=corpus/seed bs=8 count=2).

Step 8: Run Mayhem from the CLI

Again, if you’ve not already downloaded the mayhem CLI and logged in, please do that first.

Once you are logged in, you can run this example with:

mayhem run .

Mayhem will start exploring the state space of the app, and generating test vectors. The test vectors can be downloaded at any time via the web UI, or also via the CLI with mayhem sync. We post new test vectors live as they are discovered. In our tests, it took 5-6 hours to create the first test vector for one of the challenge states.

Summary

Today most Verilog tests are authored by hand.  We need to automate creating those tests so humans can focus on creative tasks.  In this post, we showed how Mayhem, with just a few lines of code, could automatically create test vectors for the Verilog program `iob-cache`.  Further, we showed we could hit “challenge corner cases”, showing Mayhem isn’t just creating trivial cases.

If you’re interested in using Mayhem on your Verilog, contact us at [email protected].

Stay Connected


Subscribe to Updates

By submitting this form, you agree to our
Terms of Use
and acknowledge our
Privacy Statement.

Step 6: Write a Mayhemfile

Mayhem uses configuration-as-code with a Mayhemfile. If you performed the quick start above, the UI will automatically generate a Mayhemfile for you, which can then be checked into git.

We wrote a Mayhemfile to show how you can ingrate Mayhem from the UNIX command prompt, and thus how to integrate into any DevOps pipeline.

We show below the Mayhemfile, and what each statement means.

# The Mayhem version you are writing this configuration for.
version: '1.16'

# Change baseimage to be the same as your push URL above. 
# You can use $MAYHEM_DOCKER_REGISTRY as shorthand for the name of the 
# registry, e.g., $MAYHEM_DOCKER_REGISTRY/dbrumley/iob-cache:latest points
# to an image on your local registry.  If you forget the name, you can find it
# with the $(mayhem docker-registry) command.
baseimage: dbrumley/iob-cache:latest

# The project name in the UI. A common convention is the git project name
project: dbrumley/iob-cache

# A unique name in the UI for this particular target app.  A project
# (e.g., git repo) can have multiple targets, each specified in their
# own Mayhemfile
target: iob-cache

# Mayhem can perform a variety of tasks, such as regression testing,
# measuring code coverage, and so on.  This line says to only run
# the primary Mayhem analysis.
tasks:
  - name: behavior_testing
  
# The executable command to run within the Docker image. This is the executable
# to test.  The "@@" says the input file name is given as argv[1].
# Mayhem will generate a file, and substitute the filename every where it
# sees an "@@".
cmds:
  - cmd: /iob-cache/hardware/simulation/verilator/obj_dir/Viob_cache @@

Step 7: (optional) Create an initial (random) input test vector

Mayhem can start with your existing test suite and grow it. We call the initial set of test vectors the “seed set”. The initial seed set is placed in a corpus directory, which mayhem run (next step) will read.

For this challenge, we had no test cases, so we generated one randomly chosen seed (via dd if=/dev/random of=corpus/seed bs=8 count=2).

Step 8: Run Mayhem from the CLI

Again, if you’ve not already downloaded the mayhem CLI and logged in, please do that first.

Once you are logged in, you can run this example with:

mayhem run .

Mayhem will start exploring the state space of the app, and generating test vectors. The test vectors can be downloaded at any time via the web UI, or also via the CLI with mayhem sync. We post new test vectors live as they are discovered. In our tests, it took 5-6 hours to create the first test vector for one of the challenge states.

Summary

Today most Verilog tests are authored by hand.  We need to automate creating those tests so humans can focus on creative tasks.  In this post, we showed how Mayhem, with just a few lines of code, could automatically create test vectors for the Verilog program `iob-cache`.  Further, we showed we could hit “challenge corner cases”, showing Mayhem isn’t just creating trivial cases.

If you’re interested in using Mayhem on your Verilog, contact us at [email protected].

*** This is a Security Bloggers Network syndicated blog from Latest blog posts authored by David Brumley. Read the original post at: https://forallsecure.com/blog/running-iob-cache-in-mayhem