Panicking the right way in Go

by Sam Moelius on June 26, 2019

A common Go idiom is to (1) panic, (2) recover from the panic in a deferred function, and (3) continue on. In general, this is okay, so long there are no global state changes between the entry point to the function calling defer, and the point at which the panic occurs. Such global state changes can have a lasting effect on the program’s behavior. Moreover, it is easy to overlook them and to believe that all actions are undone by a call to recover.

At Trail of Bits, we have developed a tool called OnEdge to help detect such incorrect uses of the “defer, panic, recover” pattern. OnEdge reduces the problem of finding such global state changes to one of race detection. Go’s outstanding race detector can then be used to find these errors. Moreover, as we explain below, you can incorporate OnEdge into your own programs in order to find these types of errors.

OnEdge is one of the tools that we use to verify software. For example, we audit a lot of blockchain software written in Go, where it is common to panic upon receiving an invalid transaction, to recover from the panic, and to continue processing transactions. However, care must be taken to ensure that an invalid transaction is reverted completely, as a partially applied transaction could, oh say for example, cause the blockchain to fork.

“Defer, Panic, and Recover”

The definitive reference on this technique is Andrew Gerrand’s blog post, referenced above. We will not give such a thorough account here, though we will walk through an example.

In Figure 1 is a simple program that employs the “defer, panic, and recover” pattern. The program randomly generates deposits and withdrawals. If there are not sufficient funds to cover a withdrawal, the program panics. The panic is caught in a deferred function that reports the error, and the program continues on.

package mainimport (	"fmt"	"log"	"math/rand")var balance = 100func main() {	r := rand.New(rand.NewSource(0))	for i := 0; i < 5; i++ {		if r.Intn(2) == 0 {			credit := r.Intn(50)			fmt.Printf("Depositing %d...\n", credit)			deposit(credit)		} else {			debit := r.Intn(100)			fmt.Printf("Withdrawing %d...\n", debit)			withdraw(debit)		}		fmt.Printf("New balance: %d\n", balance)	}}func deposit(credit int) {	balance += credit}func withdraw(debit int) {	defer func() {		if r := recover(); r != nil {			log.Println(r)		}	}()	balance -= debit	if balance < 0 {		panic("Insufficient funds")	}}

Figure 1: A program that uses the “defer, panic, and recover” pattern incorrectly.

Running the program in Figure 1 produces the output in Figure 2.

Depositing 14...New balance: 114Withdrawing 6...New balance: 108Withdrawing 96...New balance: 12Withdrawing 77...<time> Insufficient fundsNew balance: -65Depositing 28...New balance: -37

Figure 2: Output of the program in Figure 1.

Note that there is a bug: even though there are not sufficient funds to cover one of the withdrawals, the withdrawal is still applied. This bug is a special case of a more general class of errors; the program makes global state changes before panicking.

A better approach would be to make such global state changes only after the last point at which a panic could occur. Rewriting the withdraw function to use this approach would cause it to look something like Figure 3.

func withdraw(debit int) {	defer func() {		if r := recover(); r != nil {			log.Println(r)		}	}()	if balance-debit < 0 {		panic("Insufficient funds")	}	balance -= debit}

Figure 3: A better implementation of the withdraw function from Figure 1.

Following a brief introduction to Go’s race detector, we describe a method for finding improper global state changes like those in Figure 1.

The Go Race Detector

The Go Race Detector is a combination of compiler instrumentation and a runtime library. The compiler instruments (1) memory accesses that cannot be proven race-free, and (2) uses of known synchronization mechanisms (e.g., sending and receiving on a channel). The runtime library, based on Google’s ThreadSanitizer, provides the code to support the instrumentation. If two instrumented memory accesses conflict and cannot be proven synchronized, then the runtime library produces a warning message.

The Go race detector can produce “false negatives” i.e., it can fail to detect some races. However, provided that synchronization mechanisms known to the runtime library are used, every warning message that it produces is a “true positive,” i.e., an actual race.

One enables the Go race detector by passing the “-race” flag, e.g., “go run“ or “go build.“ The “-race” flag tells the Go compiler to instrument the code as described above, and to link-in the required runtime library.

Using the Go race detector is not cheap. It increases memory usage by an estimated 5-10x, and increases execution time by 2-20x. For this reason, the race detector is typically not enabled for “release” code, and is used only during development. Nonetheless, the strong guarantees that come with the detector’s reports can make the overhead worthwhile.

Detecting Global State Changes

The problem of detecting global state changes has obvious similarities to the problem of detecting data races: both involve memory accesses. Like data races, detecting global state changes would seem amenable to dynamic analysis. So, a question that one might ask is: can one leverage the Go race detector to find global state changes? Or, more precisely, can one make a global state change look like a data race?

We solve this problem by executing code that could modify global state twice: once in a program’s main thread, and once in a second, “shadow” thread. If the code does modify global state, then there will be two conflicting memory accesses, one in either thread. So long as the two threads do not appear synchronized (which is not hard to ensure), then the two memory accesses will potentially be reported as a data race.

OnEdge

OnEdge detects improper global state changes using the approach described above. OnEdge is a small library that exports a handful of functions, notably, WrapFunc and WrapRecover. To incorporate OnEdge into a project, do three things:

Wrap function bodies that defer calls to recover in WrapFunc(func() { … }).
Within those wrapped function bodies, wrap calls to recover in WrapRecover( … ).
Run the program with Go’s race detector enabled.

If a panic occurs in a function body wrapped by WrapFunc, and that panic is caught by a recover wrapped by WrapRecover, then the function body is re-executed in a shadow thread. If the shadow thread makes a global state change before calling recover, then that change appears as a data race and can be reported by Go’s race detector.

Figure 4 is the result of applying steps 1 and 2 above to the withdraw function from Figure 1.

func withdraw(debit int) {	onedge.WrapFunc(func() {		defer func() {			if r := onedge.WrapRecover(recover()); r != nil {				log.Println(r)			}		}()		balance -= debit		if balance < 0 {			panic("Insufficient funds")		}	})}

Figure 4: The withdraw function from Figure 1 with OnEdge incorporated.

A complete source file to which the above steps have been applied can be found here: account.go. Running the modified program with the race detector enabled, e.g.,

go run -race account.go

produces the output in Figure 5.

Depositing 14...New balance: 114Withdrawing 6...New balance: 108Withdrawing 96...New balance: 12Withdrawing 77...==================WARNING: DATA RACERead at 0x0000012194f8 by goroutine 8:  main.withdraw.func1()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:61 +0x6d  github.com/trailofbits/on-edge.WrapFunc.func1()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:82 +0x3d  github.com/trailofbits/on-edge.shadowThread.func1()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:239 +0x50  github.com/trailofbits/on-edge.shadowThread()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:240 +0x79Previous write at 0x0000012194f8 by main goroutine:  main.withdraw.func1()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:61 +0x89  github.com/trailofbits/on-edge.WrapFunc.func1()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:82 +0x3d  github.com/trailofbits/on-edge.WrapFuncR()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:132 +0x3d4  github.com/trailofbits/on-edge.WrapFunc()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:81 +0x92  main.withdraw()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:50 +0x84  main.main()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:39 +0x3cfGoroutine 8 (running) created at:  github.com/trailofbits/on-edge.WrapFuncR()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:126 +0x3a1  github.com/trailofbits/on-edge.WrapFunc()      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:81 +0x92  main.withdraw()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:50 +0x84  main.main()      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:39 +0x3cf==================<time> Insufficient funds<time> Insufficient fundsNew balance: -142Depositing 28...New balance: -114Found 1 data race(s)exit status 66

Figure 5: The output of the program from Figure 1 with OnEdge incorporated and the race detector enabled.

What’s going on here? As before, there are insufficient funds to cover one of the withdrawals, so the withdraw function panics. The panic is caught by a deferred call to recover. At that point, OnEdge kicks in. OnEdge re-executes the body of the withdraw function within a shadow thread. This causes a data race to be reported at line 61 in account.go; this line:

balance -= debit

This line makes a global state change by writing to the balance global variable. Executing this line in the main and shadow threads results in two writes, which Go’s race detector recognizes as a race.

Limitations

Like all dynamic analyses, OnEdge’s effectiveness depends upon the workload to which one subjects one’s program. As an extreme example, if one never subjects one’s program to an input that causes it to panic, then OnEdge will have done no good.

A second limitation is that, since Go’s race detector can miss some races, OnEdge can miss some global state changes. This is due in part to a limitation of ThreadSanitizer, which keeps track of only a limited number of memory accesses to any one memory location. Once that limit is reached, ThreadSanitizer starts evicting entries randomly.

OnEdge present and future

OnEdge is a tool for detecting improper global state changes arising from incorrect uses of Go’s “defer, panic, and recover” pattern. OnEdge accomplishes this by leveraging the strength of Go’s existing tools, namely, its race detector.

We are exploring the possibility of using automation to incorporate WrapFunc and WrapRecover into a program. For now, users must do so manually. We encourage the use of OnEdge and welcome feedback.