How Cloud Security Managers Should Respond to Meltdown and Spectre

Posted under: Research and Analysis

I hope everyone enjoyed the holidays… just in time to return to work, catch up on email, and watch the entire Internet burn down thanks to a cluster of hardware vulnerabilities built into pretty much every computing platform available.

I’m not going to go into the details or background on Meltdown and Spectre (note: if I discover a vulnerability I want it named “CutYourF-ingHeartOutWithSpoon”). Rather, I want to talk about them in the context of cloud, both short and long term implications, and recommend some response strategies. These are incredibly serious vulnerabilities not only due to their immediate implications, but because they will draw increased scrutiny to a set of hardware weaknesses that likely require a generational fix (computer generations, not your kids).

Meltdown

In short, Meltdown increases the risk of a multi tenancy break. This affects three levels:

  • It potentially enables any instance/guest on a system to read all of the memory on that system. This is the piece cloud providers have almost completely patched.
  • On a single system, it could also allow code in a container to read the memory of the entire server. This is likely also patched by the cloud providers (AWS/Google/Microsoft).
  • Since Function as a Service (“serverless”) is really code in containers, the same issues apply.

Meltdown is a privilege escalation vulnerability and requires a malicious process to be run on the system; you can’t use it to gain an initial foothold or exploitation, but to do things like steal secrets out of memory.

Meltdown in its current form on major cloud providers is likely not a short term security risk. However, just to be safe I recommend immediately applying Meltdown patches at the operating system level to any instances you have running. This would have been FAR WORSE if there hadn’t been a coordinated disclosure between the researchers, the hardware and operating system vendors, and the cloud providers. You may see some performance degradation, but anything that uses autoscaling shouldn’t really notice.

Spectre

Spectre is a different group of vulnerabilities that relies on yet a different set of hardware related issues. Right now, Spectre only allows access to memory that the application already has access to. This is still a privilege escalation issue since it’s useful for things like allowing hostile Javascript code in the browser access data outside of its sandbox. This also seems like it may be an issue for anything that runs multiple processes in a sandbox, such as containers or even to read data through the hypervisor of all guests on the same hosts.

Exploitation is difficult, the cloud providers are on it, and there is nothing to be done yet other than pay attention.

Thus, for both, your short term action is to patch instances and keep an eye on upcoming patches.

Oh- if you run a private cloud, you really need to patch everything yesterday and be prepared to replace all your hardware within the next few years. ALL THE HARDWARE. Oops.

Long term implications and recommendations

These are complex vulnerabilities related to deeply embedded hardware functionality. Spectre itself is more an entire vulnerability/exploit class than a single vulnerability. Right now we seem to have the protections we need available, and the performance implications appear manageable.

The bigger concern is that we don’t know what other variants of both vulnerability classes may appear (or be discovered by malicious actors that don’t make them public). The consensus among my researcher friends is that this is a newer area of study that while not completely novel, is most definitely drawing some highly intelligent and experienced eyeballs. I will be very surprised if we don’t see more variants and implications over the next years. Hardware manufacturers have to update chip designs and that is a slow process, and even then it is likely they will still leave holes that researchers eventually discover.

Let’s not mince words- this is a very big deal for cloud computing. The immediate risk is very manageable, but we also need to be prepared for the long term implications.

As this evolves, here is what I recommend:

  • Obviously, immediately patch all your operating systems on all your instances to the best of your ability. Hopefully the cloud provider mitigations at the hypervisor level are already protecting you, but it’s still better to be safe. Start with a focus on instances where memory leaks are the highest threat.
  • For highly sensitive workloads (e.g. encryption) immediately consider moving to dedicated tenancy and don’t run any less-privileged workloads on the same hardware. Dedicated tenancy means you are renting the box from your cloud provider and only your workloads run on it. This eliminates much of the concern of guest to host breaks.
  • Migrate to dedicated PaaS where possible, especially for things like encryption operations. For example, if you move to an AWS Elastic Load Balancer and perform discrete application data encryption in KMS, then the crypto operations and keys are never exposed in the memory of any general purpose system. This is the critical piece- the hardware under these services isn’t used for anything other than the service. Thus another tenant can’t run a malicious process and read the physical memory of the box. If you can’t run malicious code as a tenant, you can break multi tenancy and are back to having to compromise the entire system (cloud providers are damn good at preventing that). Removing the ability to run arbitrary processes is a massive roadblock to exploiting these kinds of vulnerabilities.
  • Continue to migrate workloads to Function as a Service (“serverless”, “Lambda”), but recognize there still are risks. Moving to servlerless pushes more of the responsibility of mitigating future vulnerabilities in these classes onto your cloud provider, but since tenants can run nearly arbitrary code there is always the chance there could still be issues. Right now my feel is the risk is low, and far lower than running things on your own servers or even instances.
  • Adopt DevOps and automation, because that’s the best way to move fast, fix a lot of things at once, and be prepared for the next exploit with a cute logo and its own PR team.

If I missed anything, feel free to drop a comment or hit me up at @rmogull on Twitter.

– Rich
(0) Comments
Subscribe to our daily email digest



*** This is a Security Bloggers Network syndicated blog from Securosis Blog authored by info@securosis.com (Securosis). Read the original post at: http://securosis.com/blog/how-cloud-security-managers-should-respond-to-meltdown-and-spectre