The Cloud, Outages and You: Who’s Responsible for What?

by Avi Aharon on August 14, 2019

Cloud security and cloud outages are real threats to companies that don’t plan accordingly

For many companies, cloud service providers offer a good deal. You get unlimited computing power, storage space, access to advanced services and platforms and more, all for a relatively reasonable monthly fee.

But none of that is any good unless you can get to your resources on the cloud. While the cloud provider is responsible for the security and availability of the cloud infrastructure and services, customers are responsible for the security and availability of their applications, devices, systems, architecture and configurations built on top of cloud infrastructure. So, who owns your end-to-end security? Who owns your end-to-end availability?

Not the cloud service provider, for sure. In the final analysis, it is you, the client, who is responsible for your own data. In fact, one study shows that 56% of those who suffered losses due to a cloud incident were not compensated by their providers. Of course, the cloud service provider takes responsibility for their own infrastructure and will do its best to ensure that it is available when you need it, as well as keeping it secure. But what if the provider doesn’t? What if a cloud outage occurs due to a cyberattack or some other reason? In a case like that, you’re likely to bear any losses yourself. At best, you are “sharing the responsibility” with the cloud provider.

For example, the latest edition of the AWS user agreement states that the service will “implement reasonable and appropriate measures” to protect data, but “You are responsible for … taking appropriate action to secure, protect and backup your accounts.” Such indemnity-protecting language can be found in the user agreements of all cloud service providers; for example, Google Cloud’s incident response plan clearly states that “While Google secures the underlying cloud infrastructure and services, the customer secures their applications, devices, and systems when building on top of Google’s Cloud infrastructure.”

If “they” are not responsible for your availability and security, then you are. Here are some of the availability and security risks for companies that conduct operations on the public cloud:

Availability Resource Limits: AWS may be “elastic,” but it can only stretch so far. By default, AWS has a limit for the number of instances per account per region. If you want more, you have to ask for it. Amazon EBS has limits as well, and AWS, with a long list of services, has a long list of limits. While the limit is theoretically sufficient for customers at each account level, customers do need to validate continuously that their service level needs meet their requirements. Otherwise, they could find themselves in an availability crisis just when they need the capacity most—like on Cyber Monday, when under-resourced sites will find themselves facing outages during times of heavy load.

Security Issues: The cloud is protected, every service provider will tell you—but what they really mean is that it’s protected only if you configure that protection yourself. For example, AWS EC2 instances need to be secured to prevent unrestricted SSH access, as EC2 security groups that have large numbers of open ports give hackers a range to attack. Access to specific services should be restricted only to authorized sources through specific ports. Any breach could impact availability and security in a very negative way, to say the least.

Similarly, AWS S3 buckets need to be configured to avoid public access if not mandatory. As it turns out, many organizations, including GoDaddy, Verizon, Viacom and other big-name companies have faced breaches due to poor S3 configuration. The problem is so common in fact that Amazon developed new S3 security and encryption features to prevent accidental data exposures caused by the misconfiguration of S3 data storage buckets.

And while the new encryption and security factors will likely help many customers avoid data leakage, it’s the exception that proves the rule. Amazon graciously stepped in with a solution to this issue because so many customers were suffering from breaches and outages, but don’t expect the company to act in every situation.

I could go on—a Cloud Security Alliance report lists 12 major security risks for cloud users—but the issue should be clear: Along with the benefits accrued from cloud computing, the result of innovations including computing at scale, high-speed internet, advanced algorithms, etc., bring a whole new set of security and availability risks that companies didn’t have when they did their computing in-house. The question for cloud users is how to ensure they get the maximum benefits the cloud offers, while minimizing the risks.

That solution is also going to have to take into account all the risks—those mentioned and others. But many of those risks are of the type that may not be recognized for what they are before they do their damage; a configuration change in a cloud instance, platform or service, for example, will not make itself known until that resource is called upon, or until a script dependent on it tries to execute itself and fails. The trick would be to discover the problem in advance and take action.

There are several steps companies can take:

Risk Issue Evaluation: According to Andy Jassy, AWS CEO, “If you look at the continued pace of innovation in AWS this year, we’ll launch a little over 1,800 significant services and features in 2018 up from 1,400 a year ago, The pace of innovation is getting faster and faster.” All major cloud providers keep pushing new improvements, innovative features, capabilities and services at a pace not seen before. Practically, it is impossible for human IT teams to be aware of all recent vendor and industry best practices and understand how changes impact uptime, resilience and security, especially since it is unfeasible to test the stability and quality of each change.

Automated System Evaluation: Even with all the right policies and procedures in place, accidents will still happen. Cloud outages are expensive; according to Statista, 24% of global enterprises polled said that a single hour of downtime in 2017 and 2018 cost them between $301,000 and $400,000, while for 14% of companies, that loss topped over $5 million—an hour! Availability, as we have seen, can be affected by a range of issues: misconfiguration, malware, lack of resources and more. In this kind of highly complex technology environment, with a large volume of ongoing changes and thousands of ever-evolving best practices, automating policies and processes that examine activities, dependencies, relationships, configurations, system anomalies and more could help resolve some of these problems, minimizing risks as well as downtime.

As the cloud era progresses, organizations are coming to realize that as far as outages are concerned, cloud infrastructure is really no different than any other infrastructure: Cloud outages and failures will occur and cybersecurity threats will hit their infrastructure. While they may be sympathetic, the cloud service providers aren’t really going to provide solutions to their individual problems. To ensure that their needs are met, organizations need to seek technologies that automatically and proactively detect risks and misconfigurations across all components of public cloud before they lead to service disruptions or outages and impact business.

— Avi Aharon