Six months ago ForAllSecure started analyzing Docker images. What does this mean? Imagine we have a user who wants us to fuzz their application. How do they give it to us? Do they tar it up? Do they give us access to an environment where it’s running? Do we integrate into their build pipeline? Applications are an entire ecosystem — they require specific library versions, environment variables, users, etc. While it may seem like a small limitation conceptually, this added barrier can contribute to the friction between development and security teams, especially as organizations look to incorporate security as a part of their build cycles.
This is where Docker comes into play. We wanted Docker as a packaging solution for our users because it’s accessible and easy to use, but we didn’t want the overhead of the Docker daemon and all the other fancy features that come with it. We ended up building our own lightweight version of Docker, allowing ForAllSecure to accept Docker images, while running them with the barebones RunC runtime. This allows us to analyze code without requiring changes to developer behavior. In this blog, we’ll focus on the first part of the problem: how to ingest Docker images.
Accompanying this post is the open sourcing of Rootfs Builder, the tool we use to extract a rootfs from a Docker image. A Docker image provides a portable, efficient format. Instead of sending a 4GB rootfs across the wire, users can simply give us a string like “ubuntu:latest” and ForAllSecure servers can pull the image and extract the rootfs. This value prop doesn’t just apply to ForAllSecure. Rootfs Builder allows any run time to ingest a Docker image and extract the rootfs. We chose Runc, but the extracted rootfs is vanilla (i.e. there is no Docker specific information) and will work with rkt, NSJail, etc.
It’s worth noting that there were a few already existing solutions for building a rootfs from an image. Unfortunately, they do not handle whiteouts correctly (explained further below). I also want to give a shout out to Makisu and Kaniko (written by Uber and Google respectively), which do provide functionality for extracting an image from a rootfs. They solve the problem of building Docker images in an environment not suitable for Docker, namely Kubernetes. We chose to not use their software because it was still a bit too feature-full for us.
Now that you understand the problem we are trying to solve, we can dive into the question, what is a Docker image? How do we go from a Docker “image” which is just some string like “alpine:latest” to a running instance of Alpine? In short, an image is a glorified tarball. It consists of various layers, which when merged together, form the rootfs of the container. To understand these layer, we need to make a quick detour to discuss the underlying technology, OverlayFS (OFS).
OverlayFS layers two directories on a single Linux host and presents them as a single directory. The first directory, referred to as the “lower” directory, is read-only and usually provides the base file system. The second directory, referred to as the “upper” directory, reflects any changes made to the lower directory, while leaving the lower directory itself unchanged. If a file is removed, a “whiteout” file is created in the upper directory, to simulate the removal. The mount point is the 2 merged directories. Note that OFS requires support for extended attributes in order to store metadata regarding whiteouts.
OFS is the storage driver for Docker and, as you can imagine, is well-suited for containers. The lower directory is the filesystem, and then each layer on top is a snapshot of the container filesystem at a given time. OFS is an efficient way to generate and store diffs to a filesystem.
Try it out yourself:
# Create a tmpfs because a tmpfs has support for extended attributes
What’s in an image?
Now that we understand the tech underlying a Docker image, we can look inside and better understand its contents. The Docker imasge contains 3 components:
- Manifest.json: points to all the layers and the config.json.
- Config.json: contains metadata necessary for running the container. Think Docker version, environment variables, mounts, etc.
- Layers: These are OFS layers as described above and are named using the hash of their contents. When merged together, they form the rootfs.
Let’s step through this using Docker to shed some more light on this:
Start by Docker pulling and saving the image. `docker save` saves the images to a tar archive.
marli 9:32:50 /tmp () docker pull httpd
We see the manifest.json, which points to the config, as well as each layer.
marli 9:33:51 /tmp/httpd () jq . manifest.json
I also suggest taking a look at the config.json, but it’s a bit large to include here.
Let’s check out the base layer. We see a complete file system.
marli 9:58:08 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () ls
Check out the top layer. Look familiar? This is simply an OFS upper layer described above.
marli 10:02:02 /tmp/httpd/83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea () ls -la layer
Building a rootfs
Docker merges all the layers to create a single rootfs. The merging itself is pretty straightforward. We do it ourselves in Rootfs Builder, which takes the name of a Docker image, pulls the tarball, and extracts it. For every layer, we iterate through each tar header. We make 2 passes. The first pass is to remove whiteouts, recall these are files or directories that were removed in a layer. In the second pass, we read the tar header for metadata about the file or directory, specifically the mode, uid, and gid. If the file doesn’t exist we create it, otherwise we simply replace it. We also have logic to update the uid and gid. This is necessary if you want to unshare user namespaces. For example, you may want to appear to be root in the container, but outside the container you are an unprivileged user. This requires creating a subuid mapping. The mapping looks something like:
root@21d94d3c4539:/workdir# cat /etc/subuid
This mapping reserves the first 65536 uids starting at 100000 under fas’s namespace. According to this mapping, uid 0 inside the container maps to 100000 outside the container.
Developers use Docker images every day, and now you know, they are just glorified tarballs. There’s plenty of room for improvement with Rootfs Builder. Outstanding features we hope to add will allow the user to specify:
- The number of layers to untar.
- A layer to omit when untarring.
- A binary the user is interested in. Instead of returning an entire rootfs, this will just return the binary.
But for now, hopefully Rootfs Builder will help users introspect into Docker images. You can get started with Rootfs Builder here: https://github.com/ForAllSecure/rootfs_builder
*** This is a Security Bloggers Network syndicated blog from ForAllSecure Blog authored by Marlies Ruck. Read the original post at: https://blog.forallsecure.com/demystifying-a-docker-image