Understanding and Preventing S3 Leaks

Amazon Simple Storage Service, or S3, is a popular service that many developers today rely on to quickly build applications. Over time, S3 has become a popular target for attackers, resulting in a large number of data leaks.

Most of them, such as the incident targeting Verizon, a leak of 1.8 million Chicago voter records and Pocket iNet’s leak, to name a few, are due to the buckets being misconfigured to allow public access; no authentication was required to access or download the bucket data. Besides being reported in the tech press, academic researchers have also investigated this subject (e.g., Continella et al. conducted a large-scale analysis of ~240,000 misconfigured S3 buckets).

In addition, there have been sophisticated attacks that involved exploiting buckets protected by IAM-based authentication. In those cases, attackers found a way to obtain the credentials needed to access the bucket. A common method involves stealing AWS keys from application code, or exploiting an application vulnerability.

A well-known example of the latter – an attack involving credential exposure – was the Capital One breach, where an SSRF attack on an instance hosted on AWS was the starting point for the leak of 100 million customers’ data in 2019.

Here, we examine in more detail three common attack vectors – S3 bucket misconfigurations, stolen S3 bucket credentials and application vulnerabilities – and explain how security teams can take simple steps to protect themselves against them.

S3 Bucket Misconfigurations

For a long time, newly created S3 buckets had public access by default. AWS recently remedied this situation by blocking public access by default. Moreover, users also now receive a warning message, and AWS makes you acknowledge whether or not to make a bucket publicly accessible, as shown in the screenshot below.

One can also define bucket policies to establish access control rules for S3 buckets. When the above options are checked, AWS also prevents you from creating bad policies that could expose your buckets. For instance, the AWS API would reject the policy below, which grants s3:GetObject access to all IP addresses, but those in 54.240.143.128/30 are blocked.

However, even with those mechanisms in place, one can still mistakenly declare bucket policies that might expose buckets. For example, the policy below permits access to all S3 methods from any IP address that has 201 as the first byte.

Thus, it’s easy to misconfigure these access policies in a way that inadvertently allows buckets containing sensitive data to be exposed to attackers.

Recommendations

  • Make sure the option Block all public access is selected when creating new buckets.
  • Use AWS Config and Lambda to detect and remediate if S3 buckets are publicly accessible.

Stolen S3 bucket credentials

Stolen credentials (AWS keys) are a common method by which even well-configured and protected S3 buckets may get compromised. For instance, for a Python application built on top of Django, secrets might be exposed, as in the following code snippet. In an even worse scenario, those secrets may end up exposed in public code repositories like GitHub.

Recommendations

  • Use IAM Roles instead of static secrets;
  • If only static secrets are supported:
    • define secrets out of the application’s code (e.g., using environment variables);
    • use static analysis tools to search for hard-coded secrets;
    • make sure the credentials are rotated frequently.

Application Vulnerabilities

Software weaknesses can also expose S3 buckets indirectly. Take, for example, the following code snippet. This Python code is vulnerable to server side request forgery (SSRF) attacks, as it takes an arbitrary URL from the input and calls it using an HTTP library. Adversaries could manipulate this URL to call the instance metadata API, thus getting valid credentials to issue requests to AWS APIs, including S3.

We can break it down into two security issues; first, the application is not implementing any input validation, which allows adversaries to manipulate the HTTP request destination. Second, the network rules might not restrict the outbound traffic for this server, making it possible to call any host reachable by the server, including the loopback address or any other host connected to the Internet.

Recommendations

At the application layer, one can implement input validations to make sure users — and consequently, adversaries — will not manipulate the hostname or any part of the URL that could redirect it to a sensitive host. Filtering the input, the final code would look like the following:

A comprehensive list of ways to protect against SSRF can be found at this OWASP cheat sheet. Additionally, one can use open source static analysis tools to help find code vulnerabilities.

Besides the application layer, one can also implement network restrictions. More specificially, the outbound traffic should be restricted. This restriction can be applied by following a block-by-default strategy. Additionally, one can implement an HTTP proxy to control the outbound traffic, thus bringing granularity to the HTTP request level.

Moreover, potentially sensitive hosts must be protected. For example, we can use IMDSv2 for AWS instances, which requires a valid token to call the API. The same approach can be extended to other APIs running in the local network.

There is, unfortunately, no simple way to completely secure oneself from hacks and data breaches. What organizations can do, however, is to make it hard, and therefore less profitable, for attackers to hack them. Having a multi-layered defense-in-depth strategy for your most sensitive assets, coupled with strong security controls for each layer, is one way of doing that. The above approach will hopefully help organizations implement a similar security posture for their S3 service, where sensitive data often lives.

Avatar photo

Manav Mital

Manav Mital is the co-founder and CEO of Cyral, the first security solution built for the modern data cloud and the DevOps-first world. Manav was previously the founder and CEO of Instart, a startup in the CDN space focused on improving web and mobile application performance, consumer experience, and security. Instart served notable enterprise customers and was acquired by Akamai. Before founding Instart, Manav worked with Aster Data, where he managed the design and development of major components of their flagship product, including the core database engine and all query processing. Aster Data, a pioneer in the big data market, was acquired by Teradata Corp for $263M. Aster is also where Manav met his Cyral co-founder, Dr. Srinivas Vadlamani. Before Aster, Manav was part of Yahoo's Web Search team, where he helped in designing and building systems to improve search rankings by better web page classification. Manav has a MS in Computer Science from UCLA and a BS in Computer Science from the Indian Institute of Technology, Kanpur.

manav-mital has 3 posts and counting.See all posts by manav-mital