GitHub Zero-Day: From 35K Repos Compromised to False Alarm

At 6:14 a.m. GMT on August 3, 2022, a Twitter thread from Stephen Lacy threw the security Twitter-sphere into a frenzy. An alleged zero-day (-like) vulnerability that exposed over 35,000 repositories was announced; the attack leaked credentials from these compromised repositories to a malicious Russian server.

And the crowd went wild—more than 20,000 likes and more than 8,000 retweets in hours. Security Twitter started down the rabbit hole. As more details emerged, the claims continued that more than 35,000 popular repositories had a malicious URL in their build code that sent all of the local `.env` data to this URL at build time. This included all of the most sensitive environment variables: Credentials, passwords, computer names—all sent to this malicious server in plain text. Here’s the original tweet:

The OP continued to add fuel to the fire by pointing out that most of the merges of the dangerous code were made by seemingly reputable GitHub accounts—and appeared to be real GitHub users with good reputations; some of them even quite well-known in the open source community.

GitHub doesn’t mess around where security is concerned, and quickly swooped in and removed all the malicious code and the PRs in which the alleged attack occurred, including the compromised users. Thanks to GitHub’s rapid triage and after further investigation, they released a notice that this was, in fact, a false alarm.

So, what actually happened? What confused a bona fide security researcher so much that they believed a massive-scale zero-day security attack was underway?! This is even more confusing when you realize that the essence of the ‘attack’ has been a perennial risk to all developers and cloud providers since the advent of modern (agile) software development.

Let’s take a look at the details.

Digging Deeper

Following GitHub’s quick assessment of the situation, it turned out that the repositories “under attack” were all either forks or clones of real, popular repositories—albeit with very similar names; an intentional move by the would-be hackers, which we’ll unpack shortly. But, thankfully, it turned out, no malicious code was actually merged into those original and popular repositories. 

OP then came back with some follow-up data; Lacy said that he might have been too quick to announce the vulnerability and explained that it wasn’t actually 35,000 repositories but, rather, code occurrences. This is an important distinction, as the reality is that GitHub’s search function is not flawless. Therefore, it’s more likely that this was closer to a few thousand occurrences; none of which, it seems, were the original repository.

Later on, both Aqua Security and Checkmarx released reports that also stated it was a false alarm, and recommended we all calm down.  

But the real question is: Can we calm down? Is there truly nothing to worry about?

The answer, like everything in engineering, is: It depends.

There are actually quite a few interesting takeaways from this false alarm that developers should all be aware of and work toward protecting against.

Phishing is Not Just for Email

While we usually associate phishing campaigns with email or text messages, the truth is that phishing campaigns can occur anywhere at any time and in the most unexpected places. In this case, a classic phishing attack happened in open source code. One of the infected repositories was a fork of a popular ‘rancher-compose’ tool—a tool from the Docker world used by many developers while building and running code.

The fork was opened under a new organization called `rancherio`, which demonstrated the sophistication of today’s hackers, as Rancher’s website is rancher.io. Users can mistakenly navigate to the fake and cloned `rancherio/rancher-compose` repository instead of the original and correct `rancher/rancher-compose` repository and mistake it for the original repo.

In the case of this specific attack, when someone uses the cloned `rancherio/rancher-compose` tool in their environment, all their sensitive data will be sent directly to the malicious URL. And that could lead to a real large-scale attack that, for some organizations and their users, could be catastrophic.

In this repo’s case, this was a particularly evil tactic, as `rancher-compose` is an archived repo used by people who want older Rancher versions. This can easily confuse users about the real creator of the code.

SCM Attacks Grow More Sophisticated and Complex

The part that should scare us (if we’re not already, that is) is the actual essence of the attack. This attack leveraged the sophisticated and advanced technique referred to as spoofing commit metadata. In this method, a bot is used to identify repositories without commit signing (which is actually the default option in GitHub) and then exploits this security gap by spoofing the name and the author of the commits upon the commit to the repository.

When it comes to Git, there really is no way to prevent editing of the commit message and metadata when committing code in your local environment. Anyone can name any commit using any name they want, and, ultimately, this commit is a string that includes all of that data.

The way to prevent commit spoofing—and the reason it is, for the most part, disabled by default by most users—is to require your remote repo to sign every commit with GPG keys—this adds significant friction to dev processes. With this security measure enabled, if the “committer” does not have a private GPG key for the relevant GitHub user, the commit will fail to push to the remote Git repository.  

In the ‘Git Scare’ attack, all the users that commited the malicious code were faking real (well-known) users from the real repositories. In the Rancher example, the phishing method added a social engineering element to make it look very genuine, as the last commits to the cloned and malicious repo were made by someone pretending to be Darren Shepherd, who is the original maintainer of the correct code repo.

Signing commits is a very important practice in your repo, and finding the way to do so that won’t be bypassed by developers needs to be a priority for SCM security.

Security Hygiene and Culture

We usually hear developers say things like ‘I don’t mind security, but my code is running internally, and never touches anything outside our local environments.’ With modern engineering practices, this is never the case, and is completely false!

Any time the automation folks check out a compose library from GitHub they are interacting with the outside world and environments (everything is your supply chain!) and, assuming they check out one of the fake repos, this could lead to an exact scenario where sensitive internal data could be sent out of your organization.

Your CI/CD, source code management (SCM) and code repositories have become the most vulnerable security assets in the organization. In addition, this is compounded with the human error factor, making security culture a critical vulnerability that’s often targeted by new attack methods. CI/CD, SCM and code repositories are critical assets for software development companies (often likened to an organization’s crown jewels by security engineers today), and require appropriate security controls just like other important assets such as customer data.  This is particularly true when it comes to open source code and resources that essentially belong to everyone; you can never know if there are also bad actors trying to ride the collaborative wave and inject malicious code. That is why SCM repository config and supply chain security best practices, at the code level, are becoming increasingly important.

Everybody Panic!?

False alarms are everywhere in the security world. We don’t have to look much farther than some of the most popular static code analysis (SCA) tools to find many false alarms and false positives; in fact, many times security tools themselves are considered quite noisy—they cause a lot of alert fatigue (and that leads to another negative side effect: Bypassing these tools).

The only really good current solution to dealing with false alarms in security is leveraging orchestration for your tooling. For example, if we are using an SCA solution that constantly finds and alerts us to many issues, we can add a workflow security tool to our CI/CD platform (that, via an agent, checks all external URLs called during the build process on the CI/CD runners) and that will find that malicious URL being called at build time. Both of those alarms, taken together, could then be considered to indicate a high likelihood of one true alarm.

In the traditional security world, overcoming false alarms is often managed with tools like SIEMs and SOAR, but these do not apply to CI/CD. In the DevSec/shift left world, you need to cover many different layers of the application development process from the moment you commit code to builds and packages in third-parties to deployments to production.

This requires controls at every single layer of your stack—code, infrastructure and runtime, third-party and imported components and even things related to process (like how you commit and signing).  This level of coverage is especially important for modern cloud-native stacks; there are organizations that are both working to orchestrate the many excellent open source security tools available and writing their own where there aren’t tools available. By combining best-of-breed security tooling, false alarms will occur less often and even sophisticated and unorthodox attack methods (like what happened to GitHub) will be swiftly detected and prevented.

Avatar photo

Gabriel Liechtman-Manor

Gabriel Liechtman-Manor is a senior full-stack developer with a favorite kid named Frontend. For over ten years now, Gabriel's enjoyed writing clean code, simplifying complex problems, leading feature development and influencing innovation every day. When he's not busy with code, you’ll find him talking about application performance, building confidence in codebases, product architecture, developing organizational culture and other nerdy dev stuff. Besides all that, Gabriel is a father of two, a hobbyist photographer, restless traveler and food creator. Today, Gabriel is a tech lead at Jit, focusing on solving security through code.

gabriel-liechtman-manor has 1 posts and counting.See all posts by gabriel-liechtman-manor