SBN Masscan and massive address lists

I saw this go by on my Twitter feed. I thought I’d blog on how masscan solves the same problem.

Both nmap and masscan are port scanners. The differences is that nmap does an intensive scan on a limited range of addresses, whereas masscan does a light scan on a massive range of addresses, including the range of – (all addresses). If you’ve got a 10-gbps link to the Internet, it can scan the entire thing in under 10 minutes, from a single desktop-class computer.

How massan deals with exclude ranges is probably its defining feature. That seems kinda strange, since it’s a little used feature in nmap. But when you scan the entire list, people will complain, with nasty emails, so you are going to build up a list of hundreds, if not thousands, of addresses to exclude from your scans.
Therefore, the first design choice is to combine the two lists, the list of targets to include and the list of targets to exclude. Other port scanners don’t do this because they typically work from a large include list and a short exclude list, so they optimize for the larger thing. In mass scanning the Internet, the exclude list is the largest thing, so that’s what we optimize for. It makes sense to just combine the two lists.
So the performance now isn’t how to lookup an address in an exclude list efficiently, it’s how to quickly choose a random address from a large include target list.
Moreover, the decision is how to do it with as little state as possible. That’s the trick for sending massive numbers of packets at rates of 10 million packets-per-second, it’s not keeping any bookkeeping of what was scanned. I’m not sure exactly how nmap randomizes it’s addresses, but the documentation implies that it does a block of a addresses at a time, and randomizes that block, keeping state on which addresses it’s scanned and which ones it hasn’t.
The way masscan is not to randomly pick an IP address so much as to randomize the index.
To start with, we created a sorted list of IP address ranges, the targets. The total number of IP addresses in all the ranges is target_count (not the number of ranges but the number of all IP addresses). We then define a function pick() that returns one of those IP addresses given the index:
    ip = pick(targets, index);
Where index is in the range [0..target_count].
This function is just a binary search. After the ranges have been sorted, a start_index value is added to each range, which is the total number of IP addresses up to that point. Thus, given a random index, we search the list of start_index values to find which range we’ve chosen, and then which IP address address within that range. The function is here, though reading it, I realize I need to refactor it to make it clearer. (I read the comments telling me to refactor it, and I realize I haven’t gotten around to that yet :-).
Given this system, we can now do an in-order (not randomized) port scan by doing the following:
    for (index=0; index<target_count; index++) {
        ip = pick(targets, index);
Now, to scan in random order, we simply need to randomize the index variable.
    for (index=0; index<target_count; index++) {
        xXx = shuffle(index);
        ip = pick(targets, xXx);

The clever bit is in that shuffle function (here). It has to take an integer in that range [0..target_count] and return another pseudo-random integer in the same range. It has to be a function that does a one-to-one mapping. Again, we are stateless. We can’t create a table of all addresses, then randomize the order of the table, and then enumerate that table. We instead have to do it with an algorithm.
The basis of that algorithm, by the way, is DES, the Data Encryption Standard. That’s how a block cipher works. It takes 64-bit number (the blocksize for DES) and outputs another 64-bit block in a one-to-one mapping. In ECB mode, every block is encrypted to a unique other block. Two input blocks can’t encrypt into the same output block, or you couldn’t decrypt it.
The only problem is the range isn’t neat 64-bit blocks, or any number of bits. It’s an inconveniently sized number. A cryptographer Phillip Rogaway wrote a paper how to change DES to support integer ranges instead. The upshot is that it uses integer division instead of shifts, which makes it more expensive. (I see now that I don’t reference the paper in the source code, I’m sure I did when I wrote it, I’ll have to go back and fix this. I do reference it elsewhere, though.)
So how we randomize that input variable is that we encrypt it, where the encrypted number is still in the same range.
Thus, the source of masscan‘s speed is the way it randomizes the IP addresses in a wholly stateless manner. It:
  • doesn’t use any state, just enumerates an index from [0..target_count]
  • has a fast function given an index, retrieve the indexed IP address from a large list of ranges
  • has a fast function to randomize that index using the Power of Crypto
Given this as the base, there’s lots of additional features we can add. For one thing, we are randomizing not only IP addresses to scan, but also ports. I think nmap picks the IP address first, then runs through a list of ports on that address. Masscan combines them altogether, so when scanning many ports on an address, they won’t come as a burst in the middle of the scan, but be spread evenly throughout the scan. It allows you to do things like:
    masscan -p0-65535

For this to work, we make the following change to the inner loop:
    range = port_count * target_count;
    for (index=0; index<range; index++) {
        xXx = shuffle(index);
        ip = pick(targets, xXx % target_count);
        port = pick(targets, xXx / target_count);
        scan(ip, port);

By the way, the compile optimizes both the modulus and division operations into a single IDIV opcode on Intel x86, since that’s how that instruction works, returning both results at once. Which is cool.
Another change we can make is sharding, spreading the scan across several CPUs or several servers. Let’s say this is server #3 out of 7 servers sharing the load of the scan:
    for (index=shard; index<range; index += shard_count) {
Again, notice how we don’t keep track of any state here, it’s just a minor tweak to the loop, and now *poof* the sharding feature appears out of nowhere. It takes vastly more instructions to parse the configuration parameter (masscan –shard 3/7 …) than it takes to actually do it.
Let’s say that we want to pause and resume the scan. What state information do we need to save? The answer is just the index variable. Well, we also need the list of IP addresses that we are scanning. A limitation of this approach is that we cannot easily pause a scan and change the list of IP addresses.

The upshot here is that we’ve twisted the nature of the problem. By using a crypto function to algorithmically create a one-to-one mapping for the index variable, we can just linearly enumerate a scan — but magically in random order. This avoids keeping state. It avoids having to lookup addresses in an exclude list. And we get other features that naturally fall out of the process.

What about IPv6?

You’ll notice I talking only about IPv4, and masscan supports only IPv4. The maximum sized scan right now is 48 bits (16-bit port number plus 32-bit IPv4 address). Won’t larger scans mean using 256 bit integers?

When I get around to adding IPv6, I’ll still keep a 64-bit index. The index variable is the number of things you are going to probe, and you can’t scan 64-bit space right now. You won’t scan the entire IPv6 128-bit address space, but a lot of smaller address spaces that add up to less than 64-bits. So when I get around to adding IPv6, the concept will still work.

*** This is a Security Bloggers Network syndicated blog from Errata Security authored by Robert Graham. Read the original post at: