SBN

CSPRNGs: How to Properly Generate Random Numbers

Not all random numbers in a program need to be highly secure. For
instance, when deciding what’s for dinner, a basic random module
provided by the programming language is usually sufficient. However, in
cases generating sensitive data such as nonces and private keys, the
randomness should be unpredictable. The occurrence of predictable random
data should never happen, but unfortunately, sometimes it does.

In August 2023, a vulnerability related to weak randomness in the
Bitcoin development toolkit Libbitcoin
Explorer
was
disclosed by Milk Sad, which resulted
in an estimated loss of over $900,000 USD worth of cryptocurrency
assets. This leads us to the fundamental question of the article: How do
we properly generate random numbers?

Random Number Generators in Software

To address the fundamental question, we first need to understand the
different types of number generators.

Like flipping a coin or rolling a die, a computer can also generate
random numbers using various sources, including device drivers and
external sources such as network packets and mouse movements. These
external sources are diverse enough that predicting the random numbers
is considered impossible, and this randomness is referred to as true
randomness
.

However, computers cannot solely rely on external sources for
randomness. When a large amount of randomness is required in a short
period, the values from external sources may not change sufficiently to
provide adequate entropy. Moreover, reading values from external sources
can be slower. Therefore, when a large amount of random numbers is
needed, a common method is to obtain a short-length seed from a true
randomness source and use this seed as input to generate random numbers.
This is called a pseudorandom number generator (PRNG).

The main difference between a PRNG and a true random number generator
(TRNG)
, which generates true random numbers, lies in reproducibility.
With a PRNG, the sequence is deterministic, meaning that the same
sequence is reproduced if the seed is the same. In contrast, with a
TRNG, the sequence is entirely random and cannot be reproduced.

However, not all PRNGs are secure. (Although the meaning of secure can
be accepted as meaning “unpredictable”, we will discuss what secure
formally means in the CSPRNG section.)

Let’s first consider a simple 32-bit PRNG, denoted as
PRNG(s)=x0,x1,\text{PRNG}(s) = {x_0, x_1, \dots}, where

{x0=s xi=(axi1+b) & 0xFFFFFFFF if i>0 \begin{cases} x_0 &= s \ x_i &= (ax_{i-1} + b) \textsf{ \& 0xFFFFFFFF if } i > 0 \ \end{cases}

and where ss represents the seed and a,ba, b are fixed
constants. This PRNG is known as a linear congruential
generator

(LCG) and is straightforward for efficiently generating random numbers.
However, once xix_i is observed, then all other elements in the sequence
x_i+1,x_i+2,x\_{i+1}, x\_{i+2}, \dots can be derived. Furthermore, its period is
limited to at most 2322^{32}, which is insufficient. One approach to
mitigate these issues is to truncate some bits. For example, Java’s
random employs a 48-bit state, and each generated number is the 32 most
significant bits. However, this truncation only marginally improves its
statistical characteristics and security. It is still easy to derive the
seed given multiple captured elements, and the sequence can be predicted
forwards and backwards arbitrarily long. This Github
repository is one example to
break Java’s random.

Another PRNG known as the Mersenne
Twister
is utilized in
languages such as Python and C++. The Mersenne Twister still relies on
relatively simple operations but has a long period of 21993712^{19937} – 1.
Despite Mersenne Twister offering better security compared to LCGs, it
is still predictable. Once the 624 consecutive 32-bit random numbers are
given, then the inner state can be reconstructed, rendering the sequence
predictable. You may refer to this Github
repository.

The default random-number—generation functions provided by most
programming languages prioritize performance over security. If robust
security is a priority, then a cryptographically secure pseudorandom number
generator (CSPRNG) must be employed.

CSPRNG

A CSPRNG is a PRNG that is cryptographically secure because its
output is as unpredictable and unbiased as a TRNG. It is therefore
defined in terms of indistinguishability: if a PRNG GG is
indistinguishable from a TRNG, then GG is called a CSPRNG.

To understand its mathematical properties, let’s take a look at an
example game played between Alice and Bob.

  1. Alice generates a sequence, either one generated from the TRNG or
    one from the PRNG GG, each with probability 1/21/2.
  2. Alice provides the generated sequence to Bob.
  3. Bob’s objective is to guess whether the sequence originated from a
    TRNG or the PRNG. If Bob’s guess is correct, he wins; conversely, if
    Bob’s guess is incorrect, Alice wins.

If Bob takes a random guess, then the rate he wins at is exactly 1/21/2.
This value of 1/21/2 can be considered as a baseline. However, if the
PRNG GG exhibits unusual characteristics, then Bob’s advantage can be
higher than 1/21/2. For example, if the first element always being fixed
to an odd number, then Bob’s strategy is to answer PRNG if the sequence
starts with an odd number and answer TRNG otherwise. When the sequence
is generated by the PRNG, Bob’s guess is always correct. However, if the
sequence is generated by the TRNG, Bob’s guess is right with 1/21/2
probability. In this scenario, Bob’s winning rate is (1/2)×1+(1/2)×(1/2)=3/4(1/2) \times 1 + (1/2) \times (1/2) = 3/4, which is 1/41/4 higher than the baseline. In
other words, Bob has a 1/41/4 advantage. If GG is CSPRNG, then this
advantage should be negligible. When this advantage is smaller than
21282^{-128}, we say the CSPRNG achieves 128-bits security.

When a PRNG’s inner state can be reconstructed or its next number can be
predicted, it is possible to distinguish the output of a PRNG from a
TRNG. This does not happen with CSPRNGs because their output is
indistinguishable from a TRNG (unless its seed is somehow revealed or
compromised). In other words, CSPRNGs must be resilient to any kind of
“seed recovery” attacks.

For further details, we recommend referring to these lecture
notes

from Princeton University.

Vulnerabilities From a Weak Random

But is a CSPRNG truly important in properly generating random numbers?
Yes. When sensitive data is generated using a weak PRNG, it can lead to
significant security problems. The Common Weakness Enumeration (CWE)
also includes entries related to randomness vulnerabilities, such as
CWE-334: Small Space of Random
Values
”, “CWE-335:
Incorrect Usage of Seeds in
PRNG
”, and “CWE-338:
Use of Cryptographically Weak
PRNG
”. These types of
vulnerabilities have been known for a long time but still continue to
appear. Here are several real-world examples resulting from a weak
random.

Milk Sad (CVE-2023-39910)

Milk Sad refers to a vulnerability in the
Libbitcoin Explorer
cryptocurrency wallet tool. This vulnerability was discovered during an
investigation into lost funds.

The primary issue revolves around a PRNG used in the
bx seed
command. This command generates a pseudorandom seed, as shown in the
following example:

bx seed -b 192
d915107990a42dfb2ea5762729002d47a4748e0f24aceb61

At first glance, it appears to be random. However, this PRNG used in the
command has two critical flaws:

  1. The PRNG is based on the Mersenne Twister, which is not a CSPRNG.
  2. The seed for the PRNG is chosen based on the current time, providing
    only 32 bits of entropy.

While the Mersenne Twister can be hard to predict if the output is short
and the seed has sufficient entropy, in this case, the seed is
improperly chosen, allowing attackers to brute-force the private key
generated by the bx seed command. Although there were warning
statements on the bx seed wiki
page
,
they may not have been sufficiently emphasized to users. It is suspected
that even developers might have overlooked this issue, as the wiki
landing
page

included an example of creating a new wallet address with the bx seed
command without any warnings.

Initially, the Libbitcoin team responded by stating that the bx seed
command was never intended to be cryptographically secure, and they had
no plans to address it. However, they later released a patch that
removed the command.

In addition to Milk Sad, Cake
Wallet
,
Profanity Ethereum address tool
(CVE-2022-40769)
,
and Trust Wallet
(CVE-2023-31290)

are other examples of vulnerabilities where seeds are generated with
incorrect way.

ECDSA Nonce Misuse

The digital signature scheme is a cryptographic scheme that a signer
who has the private key can generate a signature σ\sigma for a message
mm and a verifier who knows the corresponding public key can verify the
validity of the signature σ\sigma.

An Elliptic Curve Digital Signature Algorithm (ECDSA) is a digital
signature scheme over elliptic curves. In cryptocurrencies, ECDSA allows
to ensure that transactions are made by the legitimate wallet owner.
However, ECDSA, when misused, has a significant security
vulnerability related to a nonce denoted as kk. The nonce is a randomly
generated value used during signature generation.

While ECDSA is secure when used correctly, the nonce kk introduces a
potential security weakness, most notably in cases of nonce reuse. This
vulnerability allows an attacker to recover a secret key from a
signature if the same nonce is reused for the same secret key. If the
nonce is chosen with high entropy, the nonce will not collide. However,
if the nonce is generated with low entropy, the nonce can collide. In
2013, weaknesses in the PRNG used in Android led to nonce reuse,
resulting in the theft of a substantial amount of Bitcoin (see
here).

Furthermore, choosing a nonce in an insecure way, such as concatenating
half of the transaction hash and half of the private key, can also lead
to a leakage of the private key (as seen
here). Given sufficient messages,
even a small leakage of less than one bit per nonce can be
exploited, too. As biased nonce
generation can lead to a complete compromise, this makes secure nonce
generation extremely important.

In summary, generating the nonce kk securely and randomly, without
bias, is crucial in an ECDSA. To address this issue, RFC
6979
recommends using
the result of running an HMAC-DRBG with the private key and the hash
value of the message as seeds to generate the nonce kk. This approach
helps mitigate the risks associated with nonce misuse in ECDSA.

For further details, we recommend referring to this blog
post

from Trail of Bits.

CSPRNG Constructions

Then how can we construct a CSPRNG?

In 2006, the National Institute of Standards and Technology (NIST)
published a technical publication titled Recommendation for Random
Number Generation Using Deterministic Random Bit
Generators
.
In this document, four CSPRNGs are suggested:

  1. Hash-DRBG
  2. HMAC-DRBG
  3. CTR-DRBG
  4. Dual_EC_DRBG

(Note: DRBG stands for deterministic random bit generators, which means
the same as a CSPRNG.)

Among these, Dual_EC_DRBG was withdrawn in 2014 due to the discovery of
a backdoor. The backstory is interesting but off-subject. You can find
more information here if
you are curious.

In our discussion of PRNGs, we focused on the basic form where the PRNG
is initialized with a seed, and then it generates pseudorandom numbers.
In this basic form, it is impossible to provide additional entropy to
the PRNG. However, allowing the infusion of additional entropy during
operation can enhance security. Without this capability, a PRNG could be
entirely compromised if its state or entropy source is compromised for a
period. On the other hand, if it is possible to provide additional
entropy, the PRNG can recover to a secure state after sufficient entropy
is supplied. Importantly, its internal state must change each time new
entropy is provided.

These are three key components of a PRNG:

  1. Setup(s)\textsf{Setup}(s): PRNG is initialized with a seed ss.
  2. Refresh(I)\textsf{Refresh}(I): PRNG accepts a new entropy II.
  3. Next(n)\textsf{Next}(n): PRNG generates the nn bits output based on the
    current state.

Now let’s take a look into how to generate a cryptographic secure PRNG.
Excluding Dual_EC_DRBG, the other three CSPRNGs are based on the
symmetric primitive, such as hash functions and block ciphers. This
choice is reasonable because standardized algorithms for hash functions
and block ciphers exist, and their security properties are widely
accepted. Even if a vulnerability is found for a specific symmetric
primitive, it’s not a catastrophic scenario because transitioning to
more secure algorithms is relatively easier than constructing new DRBGs.

As it is the most straightforward, Hash-DRBG is the best example to
demonstrate how a CSPRNG works.

Hash-DRBG

Hash-DRBG is a DRBG built from the cryptographic hash function denoted
as HH. To aid in understanding, we will introduce a simplified version.
In this version, we denote the state as SS.

  1. Setup(s)\textsf{Setup}(s): Initialize the state as S=H(s)S = H(s).
  2. Refresh(I)\textsf{Refresh}(I): Update the state as S=H(SI)S = H(S \|\| I).
  3. Next(n)\textsf{Next}(n):
    • Generate a sequence R=H(S)H(S+1)...H(S+k)R = H(S) \|\| H(S+1) \|\| … \|\| H(S+k)
      where kk is the smallest number that Rn\|R\| \geq n.
    • Update the state as S=H(SH(S))S = H(S \|\| H(S)).
    • Return the leftmost nn bits of RR.

These three DRBGs (Hash-DRBG, HMAC-DRBG, and CTR-DRBG) are constructed
using symmetric primitives, and their security can be proven based on
the security properties of these symmetric primitives, utilizing the
concept of indistinguishability.

Note that there may be a slight gap between the claimed security of
these DRBGs and their theoretical analysis, as strong assumptions are
made in the proofs. However, these DRBGs are generally considered secure
enough for real-world use, and their security has been the subject of
ongoing research until relatively recently, as evidenced by papers
presented in EUROCRYPT 2019 (such as An Analysis of the NIST SP
800-90A Standard
).

How to Properly Generate Random Numbers

To properly generate random numbers, there are only two things we need
to keep in mind. First, the seed should be generated from enough
entropy, and second, a CSPRNG should be used instead of a weak PRNG,
such as default random modules in each language. (For more, see
this
post, which enumerates the methods to generate secure random numbers in
various programming languages.)

Conclusion

In this article, we’ve discussed the mathematical definitions of CSPRNGs
and studied vulnerabilities in the wild. While these vulnerabilities are
relatively easy to detect and defend against, their occurrence can lead
to the exposure of sensitive information and significant damage.

As PRNGs are a fundamental component in almost every software program,
there is always a risk that a developer might improperly generate random
numbers for sensitive data. Therefore, it is essential for developers to
ensure that random numbers are generated in a secure way and that
auditors track the logic for the PRNG in code.

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

*** This is a Security Bloggers Network syndicated blog from Zellic — Research Blog authored by Zellic — Research Blog. Read the original post at: https://www.zellic.io/blog/csprngs-how-to-properly-generate-random-numbers