Home » Security Bloggers Network » Game Theory: Why System Security Is Like Poker, Not Chess

Game Theory: Why System Security Is Like Poker, Not Chess

by David Brumley on May 18, 2020

The 1980’s film “Wargames” asked a computer to learn whether global thermonuclear war made sense. In the film, thermonuclear war didn’t make sense but what if, in real life, preemptive cyberattacks were our best hope for winning? Or better yet, what are the cyberwar scenarios and incentives when peace is the best strategy, just like “Wargames”? Or is it the reverse, where the best thing to do is invest in offense?

We don’t like thinking about offense and attack in cyber. But if you think about offense, can’t you develop a better defense? It can be tricky to do informally, what we need are decision and strategy making frameworks.

Game theory is a branch of mathematics that allows us to reason through cyberattack/defense scenarios without spinning in philosophical circles. Game theory was created by a polyglot and computer scientist named John von Neumann. Johnny was an interesting guy with a penchant for creating unusual acronyms such as “MAD” for Mutually Assured Destruction. He also had a knack for asking the hard and difficult questions, one famous quip being “If you say why not bomb [the Soviets] tomorrow, I say, why not today?” MAD allowed Johnny to answer that question. Ultimately MAD is why the WOPR computer in Wargames decided not to launch the thermonuclear war, concluding that “the only winning move is not to play.”

Cyber clearly isn’t thermonuclear war. We need to think about how it’s different, build models, and see how those models play out. We need to ask hard questions, just like Johnny.

For example, an exploit is just bits on a wire and can be copied if your opponent happens to log the attack. You can use that information to “reflect” or “ricochet” the exploit against your opponent or you can decide to use that new knowledge to create a patch. When you capture someone else’s exploit and use it (or patch it), you’ve used their energy against them. If you can better use an adversary’s energy and time for your own benefit, you have a higher chance of succeeding.

Game Theory Is Relevant Both on and off the National Stage

The U.S. government has a responsibility to protect the nation, which (quite rationally) entails both cyber offense and defense. The U.S. policy is to prioritize defense and disclose any action in the US Vulnerabilities Equities Process (VEP). The National Security Agency has said it typically discloses about 91% of vulnerabilities its researchers uncover discover after evaluation through the VEP process.

Controversially, I’m not a security expert that believes that the NSA should outright disclose every vulnerability. Offense does have value and cyber offense has been a key part in real-world events like hacking into the DNC server (bad) or damaging a dangerous state’s nuclear materials program (good). If there is an option between killing people with a bomb and using a cyberattack to achieve the same end, I think cyber may make more sense. Any question to disclose or not shouldn’t be made in isolation. It comes down to the game, as you see it, and what strategy is most likely to achieve the desired outcome.

Game theory isn’t just for nation-states; it’s a way of modeling scenarios and guiding decisions. You can model probabilities on how someone else will take action and what you’ll do to counter that action.

One thing that is clear, cyber offense and defense isn’t chess. It’s a game of poker. In chess, you have complete visibility into your opponent’s position and moves. In poker, you lack that visibility which also happens in the cyber realm. In cyber you don’t have certainty in what exploits your adversary knows about, whether they are using an exploit they disclosed, and whether your zero-day is really a zero-day globally.

Strategy means you’ve thought through the larger picture of various alternatives, risks, and rewards. You’ve built a game, not in the playful, fun sense, but one that allows you to reason through actions, incentives, and possibilities.

Cyber should be no different. As I wrote this article, I imagined well-known security experts screaming things like “responsible disclosure is the ethical choice” and “we have more risk as a nation when we don’t responsibly disclose.” To such experts, I’m asking you to stop and play devil’s advocate for a moment. Hack at your assumptions and really test them. I believe it leads to better thinking.

How do we think through what to do?

Let’s play a game.

The Zero Day Game

Imagine you found a new zero-day vulnerability. You can either disclose the zero-day vulnerability or create an exploit and attack others. Your actions have consequences and you have the ability to play a sequence of actions:

Exploit only. There is value in offense and you get some utility such as access, intelligence, or control.
Disclose only. We disclose because it leads to a patch, network filter, or other remediation. Disclosure isn’t defense; it just a precursor to defense. After disclosure, a patch or remedy is created, and eventually rolled out. The remedy is what takes the vulnerability off of the table for an attacker.
Exploit then disclose. Why not exploit a few systems, hope not to get caught, then disclose? If your few exploits are never noticed, you could still be seen as the good guy.
Disclose then exploit. The disclosure opportunity window is the time between when a vulnerability is disclosed and when the remedy is protecting a system. We know the opportunity window can be very large. For example, the NSA has stated it historically discloses 91% of vulnerabilities it discovers, but the U.S. security agency also has asserted using exploits against already disclosed vulnerabilities is effective. Using known vulnerabilities works (at least in part) because we can’t patch everything immediately.
Stockpile. You take no action and keep the information to yourself to decide at some later time. But zero-days have a shelf life that expires when someone else finds the same vulnerability. You don’t know how long it would take, but it’s a fair bet that any zero-day you find will eventually be discovered by others, as long as the software stays relevant.

The Actions We Take Are in the Context of the Overall World

In game theory, we create a game state to capture that context. Game theory also asks us to be formal and provide the utility — positive or negative — for each action. Ask anyone in risk assessment; if you don’t have a cost for an action, you can’t assess the risk. The nice thing about game theory is you can use different utility functions to understand how they change the outcome. For example, how does a defenders’ strategy change if the cost of being exploited is $10 vs. $1 million?

Let’s start out simple with just two players: Red and Blue. Each player is running the same software, say Windows 10. Since both parties are running the same software, each is affected by a new zero-day vulnerability. A player can only exploit or disclose when they find a zero-day. If they choose to exploit, they get a “point” per system they compromise. Each player wants to win by getting the most points or at least tie.

The number of computer systems matter in this game, because it highlights the potential asymmetry in risk a particular vulnerability may pose. Let’s assume Red and Blue are different, where Blue has 10 computers and Red has three.

(Red’s Perspective): If I discover a new zero-day, I can get up to 10 points attacking Blue’s computers. I only have three vulnerable computers, so at most Blue can get three points. Since 10 > 3, I’ll always attack.

(Blue’s Perspective): If I discover a new zero-day, l can get up to three points attacking Red’s computers. However, if Red finds it, they’ll get 10 points. It makes sense to disclose and patch, assuming I can get the patches installed before Red attacks.

In this game, Blue is incentivized not to attack. Ethically that seems like a good outcome. Unfortunately, Red is incentivized to wage war. Later we will look at one way Red could be incentivized to make peace.

Even this simple example highlights some lessons and properties:

Measure how quickly you get remedies deployed to disclosed vulnerability. Knowing you are running vulnerable software isn’t enough. What matters is how quickly you can deploy a remedy or patch. That is something you can measure and optimize.
Small powers — or those with less to lose — are more prone to attack. This checks the box that the model represents reality in this dimension.
Responsible disclosure has two worlds: one where the vendor fixes the software and one where they don’t. If the vendor never fixes the issue (cough cough, IoT), does it help? In the near term, it gives bad guys information on where to look. On the other hand, the traditional argument is it helps provide public awareness on who is a “responsible” vendor and who is not. Beyond that we can start to model such scenarios. What is the (negative) utility in shaming an unresponsive vendor? How bad does it need to be for them to take action? Would a rational person simply ignore the vulnerability knowing it kicks the can down the road?
The player that finds the zero-day has the choice. If you don’t spend time finding vulnerabilities in your own software and supply chain, your strategy is by definition reactive. I’d also add if you don’t use techniques, at the very least, as comprehensive as an adversary (e.g., criminals — if you are a business), you are also choosing to be reactive.

Zero-Day Is Really Zero-Disclosure

The term “zero-day vulnerability” is a bit of a misnomer. If you find a previously unreported vulnerability that doesn’t mean no one else knows about it. What it means is no one else has publicly disclosed it.

Suppose Blue found a new zero-day using either:

A widely-available fuzzer, like AFL, after two days of analysis.
A super-secret, next-generation technology after 10 days.

The method you used to find the vulnerability can change the probability that your opponent also finds the vulnerability. If it took you two days to discover, then it is likely that your attacker can also discover the vulnerability in two days. The time it takes to find a vulnerability relates to how easy or difficult the vulnerability is to discover. However, the super-secret technique that you used to find the vulnerability is yours alone. If it finds a vulnerability (and it’s not found with AFL), that vulnerability likely has a longer shelf life before someone else discovers it.

You can also start to estimate how many new exploits your adversary may have. For example, Google has reported over 3,849 new security-critical bugs using their oss-fuzz infrastructure over three years, which works out to about 3.5 per day. Think about it: Google, statistically, will find 28 new security issues between Christmas and New Year’s. Google has nation-state offensive capabilities. Yes, weaponization takes more time and not all 3.5 vulnerabilities per day can be weaponized, but you get the gist.

Google uses open source tools in their open source fuzzing to find bugs before attackers. If you’re serious about being proactive, I recommend you follow their lead and similarly employ techniques used also by attackers. It’s a way to be proactive. Even if you had a magical technique that found all bugs, fuzzing and other techniques used by attackers can help too. If you find a bug and have data that shows how long it takes using such a tool, you can use that information to gauge the risk or how long it will take an attacker.

Ricochet Attacks and the Glass House

Exploits are bits and can be copied. What if you got really good at ricochet? That changes the strategy in game models. Interestingly, it can provide a real incentive for everyone not to attack.

What if when Blue launches an attack, Red can ricochet? Blue can start reasoning about possibilities:

If my attack goes undetected I get three points and Red has zero. I win.
If Red can ricochet — i.e., detect and reflect the zero-day being used against any particular system — he can copy it and exploit my 10 vulnerable computers. I shouldn’t attack.

Consider some extreme values. If Red can 100% ricochet any attack, then Blue should never choose to attack. If Red has 0% chance of a ricochet, Blue should always attack in this game. The extreme values help clarify scenarios, but we don’t need to assume 100% ricochet. What if Red siphoned off just 10% of their traffic for really deep analysis? There is only a probability they see the zero-day; is it enough to disincentivize Blue?

What is interesting in ricochet is it incentivizes peace even when there is a vulnerability. A bit like MAD, but without the world being destroyed. If Red and Blue have an equal number of systems, and both have ricochet, neither should attack. It’s like the old saying: those who live in glass houses should not throw stones.

To me, the framework suggests the US is behaving rationally. They likely have the most to lose if someone else finds and weaponizes a vulnerability. Rationally (not just ethically), it makes sense to put their thumb on the disclosure side of the scale.

Beyond outright ricochet, we can think of a disclosure as providing some partial information to an attacker and that also guides decisions. Blue may further reason:

If I disclose the vulnerability, it will take me time to field the patch for my 10 systems. I know I need to do it faster than Red can weaponize the disclosure.
If I do nothing, I don’t know if Red will later find the exploit and use it. After all, Red is smart and has people also looking for zero-days. How long can I wait to make a decision?

Ricochet is not necessarily revenge. For example, suppose Red had an ally Orange with 10 vulnerable computers. Previously, without an ally, the incentives seem to promote Red attacking. If Blue can ricochet, he can disincentivize Red by ricocheting attacks to Orange. Red now has to be comfortable with the three points they’d lose if the vulnerability is discovered plus the 10 points of damage Blue can inflict against Orange. This new world suggests Red should no longer attack first.

Imagine a crazy world where Russia simply said, “If I see a cyberattack, I will ricochet the same attack against every vulnerable computer in Israel.” That would incentivize Israel to not just keep the peace with Russia, but also incentivize Israel to pressure allies to not attack as well. It would also guide national policy (e.g., getting really good at ricochet).

Even if you can’t ricochet, the game theory suggests you should disclose not just vulnerabilities you find, but also those launched against you. Attack/defense hacking competitions teach us the best thing to do is attack the weakest player first. If you use an exploit against a weak player and they detect it, you know not to use it against a strong player. It doesn’t say a stronger player wouldn’t detect it as well, but does provide some information.

If you disclosed any attack on your network, especially if you disclosed a new zero-day, you could be disincentivizing attackers. It would make sense, at least, that they don’t attack you first but someone else.

Specifics on how to use game theory at a national level are theoretic at this point. The NSA awarded its “Science of Security” award on a paper written by myself and graduate student, Tiffany Bao, on the subject of Game Theory. The paper makes simplifying assumptions that likely don’t capture many real-world factors. It assumes rationality when we know the world isn’t rational. It takes an actuarial viewpoint where there is a utility function for consequences when the real world is more complicated. The point is to shed light on a method of thinking that has worked in MAD, in economics, and other areas. Game theory, in general, can also highlight where we can place incentives that may not be obvious and whether those incentives actually change the game we (think) we’re playing.

Optimizing Choices

In 2014, Tiffany Bao and Steven Turner, from my lab, published a paper on recognizing functions inside COTS compiled binaries. Function identification is a fundamental challenge in vulnerability research. Common wisdom is that the better you do at function identification the more productive your vulnerability research will be. Common reasoning is that vulnerability discovery doesn’t work well until you break down a commercial-off-the-shelf (COTS) application into functions. So, the first step in vulnerability research is often to reverse engineer the COTS application into functions.

Our experiments showed that our tool ByteWeight had 99% precision and 98% recall on 64-bit binaries, where IDA pro (a state-of-the-art commercial tool) had 74% and 55% precision and recall, respectively. A solid improvement.

We then asked, “to what end would such an improvement matter?” It’s easy to cherry-pick cases where a missing function makes it harder to find a vulnerability. But anecdotes aren’t great when deciding strategy. So, we built two worlds: one where ByteWeight was used and one where it wasn’t.

To make it more concrete, we analyzed whether ShellPhish’s decision to use ByteWeight mattered in their third-place victory at the DARPA Cyber Grand Challenge (CGC). In world 1, ShellPhish used ByteWeight and we had their real-life performance in the CGC. In world 2, we analyzed how well ShellPhish would have done without ByteWeight. It didn’t matter, they still received third place. The details are in Tiffany Bao’s Ph.D. thesis (section 6.3.1).

A game-theoretic analysis showed that IDA Pro was “good enough”. Function identification wasn’t a barrier in the CGC. A better strategy would have been to put equivalent R&D dollars on better fuzzing techniques. The reason is getting to 100% effectiveness would not have changed the outcome compared to just using IDA Pro.

What can you learn? If you have a goal, such as finding zero-days or defense, ask the question “how will this change the outcome if we are successful?” Assume research has a breakthrough and is 100% effective. Does that change the game?

For example, suppose you can invest either in a really deep static analysis tool that highlights buggy lines of code that identifies 100% of all flaws, but it’s difficult to take action on a report. Is that deep analysis really benefitting you compared to something less deep but more actionable? The goal is typically not to find flaws, but to reduce the window from when a vulnerability is introduced to when a patch is fielded. Think through all the incentives that go into such a program.

Summary

Cyber should be no different. As I wrote this blog, I imagined well-known security experts preaching that “responsible disclosure is the ethical choice” or that “we have more risk as a nation when we don’t responsibly disclose.” To such experts, I’m asking you to stop and play devil’s advocate for a moment. Hack at your assumptions and really test them. I believe it leads to better thinking.

Originally published at The New Stack.