Why the recent Twitch breach means much more than “just” leaked source code

by Dotan Nahum on October 7, 2021

Amazon-owned Twitch, a massively popular video streaming service, was recently breached by hackers who leaked a massive pile of source code, data, unreleased products, payout reports, in parts. The first part weighs at about 125GB.

Twitch’s own source code is also contained within the breach. And has started circulating around 4chan, been taken down and at the moment circulates around Twitter and other social networks. It seems that the leak cannot be stopped at this stage.

The act itself, apparently was directed towards Amazon, as the original 4chan poster said.

Our analysis of the breach so far reveals a few points:

From the sizing of the files – apparently, the leaked code includes history and therefore is a git repository.
Some packages potentially contain video game assets and so weigh in much heavier than a typical source code repository
There are raw financial data as well in the breach, as reports are starting to show

While we do not know how this happened yet, this kind of leak usually falls into one of two categories:

Technology-based leak due to a badly configured server or possibly a hacker who exploited a vulnerability to infiltrate the company’s network
Human-based leak due to a person’s actions, whether accidentally or intentionally

As the analysis of the actual source code starts being circulated throughout the Web, the risks contained within this leak are bigger than just an IP (intellectual property) leak and copyrighted material.

Breakdown of names and sizes, indicating source code with history, and assets

And this is not just Twitch. Because source code leaks commonly contain login credentials, application programming interface (API) keys, access tokens, and other confidential information, it makes a highly sought after target for hackers.

Source: Source code of over 50 high profile organizations leaked online

Follows below, is a play-by-play of how hackers would analyze a typical source code leak, and why it’s really not just about the source code and intellectual property being leaked.

Deep research of vulnerabilities in code

With 125 GB of source code, hackers and attackers are now long busy in the task of deeply analyzing the source code itself.

As written by the following Twitter user, hackers are probably redirecting their efforts and focus from what ever they’re doing, and into researching the Twitch source code.

The general sense in this tweet is correct, but not necessarily the technique. While today’s modern code might not be all about finding buffer overflows (strcpy, strcat etc.), the notion is precisely the same. Find forgotten, unauthenticated endpoints, edge cases that lead to authorized use cases, and of course your typical admin/admin case. Or will it be twitch123 in this case? only time will tell.

Secrets in code

One of the best return on investment for hackers right now will be to find secrets, keys, passwords and certificates in that code base. It’s a finding that does not require an elaborate scheme of attack — once you have a key, token or password, you’re in. And if one didn’t work – it only helps to have a pile of those, retrieved from hardcoded points in code, settings, and configuration.

Typically:

Authentication tokens: JWT, OAuth2, custom made
Basic authentication: user/password pairs
Certificates for SSL/x509 based authentication
Any vendor based authentication credentials (Auth0, etc.)
And many, many more

If you’d like to validate secrets you have found in your code but unsure if they work, you can use Keyscope, an open source key validation tool

Hijacked 2FA authentication

The world is moving to multi-factor authentication, and specifically 2FA (two-factor authentication), typically involving authenticating with password and an SMS code or email code, or a designated authenticator app — and that’s a good thing.

Back to basics: Multi-factor authentication (MFA) | NIST — Credit: nist.gov

Unfortunately, when you actually have the source code, you can potentially understand and control the systems that make up the authentication flow. For example, a hacker is able to access the service that is sending SMS messages through information that’s being described in the source code, that means they now control the authentication flow.

Lateral movement

A high priority for hackers is lateral movement. That means, that instead of grabbing the obvious, that everyone are trying right now, because everyone are having the same bag of leaked data – they’ll try to get an indirect win.

Meme Creator - Funny Oh, congratz Another lateral move Meme Generator at MemeCreator.org!

An indirect win can be access to SaaS vendors. Services use external APIs, and the source code might contain all necessary details to connect to those APIs. If you can access and control such an API, you can control part of the product itself.

Another obvious indirect win is databases. While everyone are busy looking for user credentials in code, on that initial checklist for hackers is to understand if passwords are being salted, encrypted per practice – in the database. And the answer for that simply lies in the piece of code that does the authentication against the database. For hackers, this means assessing the amount of investment they need to make. Is this a brute forcing act or simply a grab and run?

We know from other incidents, that hackers also look for lateral movements into servers. If the source code contains information about connecting, authenticating and deployments – they can deploy custom scripts and processes onto production servers.

“God” mode

username: admin password: admin | Cheeto Lock | Know Your Meme

Hackers also look for anything giving them admin permissions, since it’s the most effective way to gain control, quickly. Even if best practices were kept, and source code and configuration does not contain admin credentials, there’s still artifacts to scan:

Binary files
Containers
Logs

Through looking at logs that were captured in a given leak, hackers can potentially grab traffic, and within that traffic tokens and cookies used for authentication might actually reveal an admin session that was authenticated. From there all that they would need is to hijack the session (and make sure it’s still valid).

How to prevent this from happening?

First, make sure you have a plan in case a thing like this does happen. Handling an incident well is a well coordinated effort, and the risk is to increase the damage that’s already been done, by mishandling the incident itself.
Scanning source code for hardcoded secrets, credentials and tokens. From the blueprint above, you’ll see that to protect yourself properly you need great recall (find all kinds of secrets in the world) and precision (when a secret is found, it’s really a secret and not noise). Finding a solution that gives you both is crucial.
Scan *all* files in repos. Repositories no longer contain “just” code. It’s all about the documentation, binaries, sample data, test files, and more. And if history is involved – make sure to scan and clean all of your history as well.
Code creates artifacts. Logs, containers, zip files, assets. Make sure these get scanned as early as possible.

SpectralOps gives you a complete mapping of all of the above: code, logs, containers, productivity suite and more.

By following these steps and integrating a complete security solution, you can recover quickly from a source code leak. But don’t wait for a source code leak to happen. See how Spectral Ops can protect you from source code leaks and secure your DevOps pipeline today. Request a demo.

The post Why the recent Twitch breach means much more than “just” leaked source code appeared first on Spectral.

*** This is a Security Bloggers Network syndicated blog from Security boulevard – Spectral authored by Dotan Nahum. Read the original post at: https://spectralops.io/blog/why-the-recent-twitch-breach-means-much-more-than-just-leaked-source-code/