As per the UpGuard report, data leaked through AWS S3 is sensitive data retrieved from Facebook by third-part apps – 540 million user records got breached. In the past, we’ve seen similar sensitive data leaks through AWS S3 such as the Verizon breach, GOP voter data breach and the Uber breach.
Leaving aside the ethics of Facebook’s practices of sharing personal data (e.g. Cambridge Analytica Scandal, was AWS S3 security a contributing factor in this leak? Yes and no. This incident happened because third-party app developers irresponsibly placed Facebook retrieved content in publicly accessible AWS S3 buckets.
Technically, this is not really AWS’s fault. AWS or any cloud IaaS or PaaS provider will remind you of the ‘shared responsibility model’, yet such data breaches keep happening.
In this case, the culprit app developers who allowed this leak to happen had no disincentive to keep personal data secure, as such cloud-based data security breaches through AWS S3 (or similar cloud IaaS and PaaS) is a critical concern for commercial enterprises.
The issue of data security in this event is deep-rooted. AWS S3 provides server-side-encryption as a defense against attacks on physical data storage, but that’s ineffectual against misconfigured access control.
Adi Shamir (the “A” of RSA) has stated a law of computer security that “Cryptography is typically bypassed, not penetrated”. That’s what happened here, AWS’s S3 encryption was bypassed, and previously at Verizon and GOP.
We all tend to treat precious data like precious personal belongings (like cash or jewelry). We tend to think that if we lock the vault where we keep our precious belongings, they will be safe. While that’s true for our assets in the physical world, it is not true for our data assets. Data is not a physical object. Access to data is controlled through other data in computer and networking system called ‘perimeter defense’ which itself is vulnerable to misconfigurations or breaches.
Data can be copied to ‘n’ different places while it is still resident in its original location. Data is not stolen by removing its physical presence from a physical medium, its copied. You may not even detect that your data has been breached until you analyze your data access logs.
AWS S3 data breaches keep happening because people keep leaving the door open, after all it’s human nature to make mistakes. AWS recommends employing client-side-encryption to counter such mistakes. However, client-side-encryption comes at a huge cost in terms of data usability. For instance, you lose complete usability of data when building a data lake based on AWS S3 because client-side encrypted data cannot be processed by downstream services. Commercial enterprises need to balance data security with data usability, so this may not be a practical or effective solution.
A practical alternative to client-side-encryption is data-centric protection. This means protecting the data itself in a way that maintains its usability by maintaining data format, and allowing access to unprotected data on a least-privilege basis.
In an AWS S3 scenario, it is about protecting sensitive data itself in a way such that it retains usability through analytical systems such as AWS QuickSight, Athena, EMR, Redshift Spectrum and even third-party data warehouses like Snowflake. This is not really a new concept. It has been practiced in PCI world in protecting PANs for years, but somehow – possibly awaiting regulations such as CCPA in US states – a critical mass of PII holders haven’t really caught up to this practice yet, thus resulting in such breaches.
For more information on protecting data in the cloud, please don’t hesitate to get in touch.
*** This is a Security Bloggers Network syndicated blog from Blog – Protegrity authored by Raj Jain. Read the original post at: https://www.protegrity.com/facebook-data-leaked-through-aws-s3/