SBN

Yes, GitHub’s Copilot can Leak (Real) Secrets

Yes, GitHub's Copilot can Leak (Real) Secrets

There has been a growing focus on the ethical and privacy concerns surrounding advanced language models like ChatGPT and OpenAI GPT technology. These concerns have raised important questions about the potential risks of using such models. However, it is not only these general-purpose language models that warrant attention; specialized tools like code completion assistants also come with their own set of concerns.

Read: Why ChatGPT is a security concern for your organization (even if you don't use it)

A year into its launch, GitHub’s code-generation tool Copilot has been used by a million developers, adopted by more than 20,000 organizations, and generated more than three billion lines of code, GitHub said in a blog post.

However, since its inception, security concerns have been raised by many about the associated legal risks associated with copyright issues, privacy concerns, and, of course, insecure code suggestions, of which examples abound, including dangerous suggestions to hard-code secrets in code.

Extensive security research is currently being conducted to accurately assess the potential risks associated with these newly advertised productivity-enhancing tools.

This blog post delves into recent research by Hong Kong University to test the possibility of abusing GitHub’s Copilot and Amazon’s CodeWhisperer to collect secrets that were exposed during the models' training.

As highlighted by GitGuardian's 2023 State of Secrets Sprawl, hard-coded secrets are highly pervasive on GitHub, with 10 million new secrets detected in 2022, up 67% from 6 million one year earlier. 

Given that Copilot is trained on GitHub data, it is concerning that coding assistants can potentially be exploited by malicious actors to reveal real secrets in their code suggestions.

Extracting Hard-coded Credentials

To test this hypothesis, the researchers conducted an experiment to build a prompt-building algorithm trying to extract credentials from the LLMs. 

The conclusion is unambiguous: by constructing 900 prompts from GitHub code snippets, they managed to successfully collect 2,702 hard-coded credentials from Copilot and 129 secrets from CodeWhisper (false positives were filtered out with a special methodology described below). 

Impressively, among those, at least 200, or 7.4% (respectively 18 and 14%), were real hard-coded secrets they could identify on GitHub. While the researchers refrained from confirming whether these credentials were still active, it suggests that these models could potentially be exploited as an avenue for attack. This would enable the extraction and likely compromise of leaked credentials with a high degree of predictability.

The Design of a Prompt Engineering Machine

The idea of the study is to see if an attacker could extract secrets by crafting appropriate prompts. To test the odds, the researchers built a prompt testing machine, dubbed the Hard-coded Credential Revealer (HCR). 

The machine has been designed to maximize the chances of triggering a memorized secret. To do so, it needs to build a strong prompt that will "force" the model to emit the secret. The way to build this prompt is to first look on GitHub for files containing hard-coded secrets using regex patterns. Then, the original hard-coded secret is redacted, and the machine asks the model for code suggestions.

Yes, GitHub's Copilot can Leak (Real) Secrets
the Hard-coded Credential Revealer (HCR)

Of course, the model will need to be requested many times to have a slight chance of extracting valid credentials, because it often outputs "imaginary" credentials. 

They also need to test many prompts before finding an operational credential, allowing them to log into a system. 

In this study, 18 patterns are used to identify code snippets on GitHub, corresponding to 18 different types of secrets (AWS Access Keys, Google OAuth Access Token, GitHub OAuth Access Token, etc.).

💡
Although 18 secrets types is far from exhaustive (the GitGuardian secrets scanner is able to detect 350+ types of secrets), they are still representative of services widely used by software developers and are easily identifiable.

Then, the secrets are removed from the original file, and the code assistant is used to suggest new strings of characters. Those suggestions are then passed through four filters to eliminate a maximum number of false positives

Secrets are discarded if they:

– don't match the regex pattern

– don't show enough entropy (not random enough, ex: AKIAXXXXXXXXXXXXXXXX)

– have a recognizable pattern (ex: AKIA3A3A3A3A3A3A3A3A)

– include common words (ex: AKIAIOSFODNN7EXAMPLE)

A secret that passes all these tests is considered valid, which means it could realistically be a true secret (hard-coded somewhere else in the training data).

Results

Among 8,127 suggestions of Copilot, 2,702 valid secrets were successfully extracted. Therefore, the overall valid rate is 2702/8127 = 33.2%, meaning that Copilot generates 2702/900 = 3.0 valid secrets for one prompt on average.

CodeWhisperer suggests 736 code snippets in total, among which we identify 129 valid secrets. The valid rate is thus 129/736 = 17.5%.

💡
Keep in mind that in this study, a valid secret doesn't mean the secret is real. It means that it successfully passed the filters and, therefore has the properties corresponding to a real secret.

So, how can we know if these secrets are genuine operational credentials? The authors explained that they only tried a subset of the valid credentials (test keys like Stripe Test Keys designed for developers to test their programs) for ethical considerations. 

Instead, the authors are looking for another way to validate the authenticity of the valid credentials collected. They want to assess the memorization, or where the secret appeared on GitHub.

The rest of the research focuses on the characteristics of the valid secrets. They look for the secret using GitHub Code Search and differentiate strongly memorized secrets, which are identical to the secret removed in the first place, and weakly memorized secrets, which came from one or multiple other repositories. Finally, there are secrets that could not be located on GitHub and which might come from other sources.

Consequences

The research paper uncovers a significant privacy risk posed by code completion tools like GitHub Copilot and Amazon CodeWhisperer. The findings indicate that these models not only leak the original secrets present in their training data but also suggest other secrets that were encountered elsewhere in their training corpus. This exposes sensitive information and raises serious privacy concerns.

For instance, even if a hard-coded secret was removed from the git history after being leaked by a developer, an attacker can still extract it using the prompting techniques described in the study. The research demonstrates that these models can suggest valid and operational secrets found in their training data.

These findings are supported by another recent study conducted by a researcher from Wuhan University, titled Security Weaknesses of Copilot Generated Code in GitHub. The study analyzed 435 code snippets generated by Copilot from GitHub projects and used multiple security scanners to identify vulnerabilities.

According to the study, 35.8% of the Copilot-generated code snippets exhibited security weaknesses, regardless of the programming language used. By classifying the identified security issues using Common Weakness Enumerations (CWEs), the researchers found that "Hard-coded credentials" (CWE-798) were present in 1.15% of the code snippets, accounting for 1.5% of the 600 CWEs identified.

Mitigations

Addressing the privacy attack on LLMs requires mitigation efforts from both programmers and machine learning engineers.

To reduce the occurrence of hard-coded credentials, the authors recommend using centralized credential management tools and code scanning to prevent the inclusion of code with hard-coded credentials.

During the various stages of code completion model development, different approaches can be adopted:

– Before pre-training, hard-coded credentials can be excluded from the training data by cleaning it.

– During training or fine-tuning, algorithmic defenses such as Differential Privacy (DP) can be employed to ensure privacy preservation. DP provides strong guarantees of model privacy.

– During inference, the model output can be post-processed to filter out secrets.

Conclusion

This study exposes a significant risk associated with code completion tools like GitHub Copilot and Amazon CodeWhisperer. By crafting prompts and analyzing publicly available code on GitHub, the researchers successfully extracted numerous valid hard-coded secrets from these models. 

To mitigate this threat, programmers should use centralized credential management tools and code scanning to prevent the inclusion of hard-coded credentials. Machine learning engineers can implement measures such as excluding these credentials from training data, applying privacy preservation techniques like Differential Privacy, and filtering out secrets in the model output during inference.

These findings extend beyond Copilot and CodeWhisperer, emphasizing the need for security measures in all neural code completion tools. Developers must take proactive steps to address this issue before releasing their tools.

In conclusion, addressing the privacy risks and protecting sensitive information associated with large language models and code completion tools requires collaborative efforts between programmers, machine learning engineers, and tool developers. By implementing the recommended mitigations, such as centralized credential management, code scanning, and exclusion of hard-coded credentials from training data, the privacy risks can be effectively mitigated. It is crucial for all stakeholders to work together to ensure the security and privacy of these tools and the data they handle.

*** This is a Security Bloggers Network syndicated blog from GitGuardian Blog - Automated Secrets Detection authored by Thomas Segura. Read the original post at: https://blog.gitguardian.com/yes-github-copilot-can-leak-secrets/

Avatar photo

Thomas Segura

What You Need to Scale AppSec Thomas Segura - Content Writer @ GitGuardian Author Bio Thomas has worked both as an analyst and as a software engineer consultant for various big French companies. His passion for tech and open source led him to join GitGuardian as technical content writer. He focuses now on clarifying the transformative changes that cybersecurity and software are going through. Website:https://www.gitguardian.com/ Twitter handle: https://twitter.com/GitGuardian Linkedin: https://www.linkedin.com/company/gitguardian Introduction Security is a dilemma for many leaders. On the one hand, it is largely recognized as an essential feature. On the other hand, it does not drive business. Of course, as we mature, security can become a business enabler. But the roadmap is unclear. With the rise of agile practices, DevOps and the cloud, development timeframes have been considerably compressed, but application security remains essentially the same. DevSecOps emerged as an answer to this dilemma. Its promise consists literally in inserting security principles, practices, and tools into the DevOps activity stream, reducing risk without compromising deliverability. Therefore there is a question that many are asking: why isn't DevSecOps already the norm? As we analyzed in our latest report DevSecOps: Protecting the Modern Software Factory, the answer can be summarized as follows: only by enabling new capacities across Dev, Sec and Ops teams can the culture be changed. This post will help provide a high-level overview of the prerequisite steps needed to scale up application security across departments and enable such capabilities. From requirements to expectations Scaling application security is a company-wide project that requires thorough thinking before an y decision is made. A first-hand requirement is to talk to product and engineering teams to understand the current global AppSec maturity. The objective at this point is to be sure to have a comprehensive understanding of how your products are made (the processes, tools, components, and stacks involved). Mapping development tools and practices will require time to have the best visibility possible. They should include product development practices and the perceived risk awareness/appetite from managers. One of your objectives would be to nudge them so they take into account security in every decision they make for their products, and maybe end up thinking like adversaries. You should be able to derive security requirements from the different perceptual risks you are going to encounter. Your job is to consolidate these into a common set for all applications, setting goals to align the different teams collaborating to build your product(s). Communicating transparently with all relevant stakeholders (CISO, technical security, product owner, and development leads) about goals and expectations is essential to create a common ground for improvement. It will be absolutely necessary to ensure alignment throughout the implementation too. Open and accessible guardrails Guardrails are the cornerstone of security requirements. Their nature and implementation are completely up to the needs of your organization and can be potentially very different from one company to the other (if starting from scratch, look no further than the OWASP Top10). What is most important, however, is that these guardrails are open to the ones that need them. A good example of this would be to centralize a common, security-approved library of open-source components that can be pulled from by any team. Keep users' accessibility and useability as a priority. Designing an AppSec program at scale requires asking “how can we build confidence and visibility with trusted tools in our ecosystem?”. For instance, control gates should never be implemented without considering a break-glass option (“what happens if the control is blocking in an emergency situation?”). State-of-the-art security is to have off-the-shelf secure solutions chosen by the developers, approved by security, and maintained by ops. This will be a big leap forward in preventing vulnerabilities from creeping into source code. It will bring security to the masses at a very low cost (low friction). But to truly scale application security, it would be silly not to use the software engineer's best ally: the continuous integration pipeline. Embed controls in the CI/CD AppSec testing across all development pipelines is the implementation step. If your organization has multiple development teams, it is very likely that different CI/CD pipelines configurations exist in parallel. They may use different tools, or simply define different steps in the build process. This is not a problem per se, but to scale application security, centralization and harmonization are needed. As illustrated in the following example CI/CD pipeline, you can have a lot of security control steps: secrets detection, SAST, artifact signing, access controls, but also container or Infrastructure as Code scanning (not shown in the example) (taken from the DevSecOps whitepaper) The idea is that you can progressively activate more and more control steps, fine-tune the existing ones and scale both horizontally and vertically your “AppSec infrastructure”, at one condition: you need to centralize metrics and controls in a stand-alone platform able to handle the load corresponding to your organization’s size. Security processes can only be automated when you have metrics and proper visibility across your development targets, otherwise, it is just more burden on the AppSec team's shoulders. In turn, metrics and visibility help drive change and provide the spark to ignite a cultural change within your organization. Security ownership shifts to every engineer involved in the delivery process, and each one is able to leverage its own deep (yet partial) knowledge of the system to support the effort. This unlocks a world of possibilities: most security flaws can be treated like regular tickets, rule sets can be optimized for each pipeline based on criticality, capabilities or regulatory compliance, and progress can be tracked (saved time, avoided vulnerabilities etc.). In simpler terms, security can finally move at the DevOps speed. Conclusion Security can’t scale if it’s siloed, and slowing down the development process is no longer an option in a world led by DevOps innovation. The design and implementation of security controls are bound to evolve. In this article, we’ve depicted a high-level overview of the steps to be considered to scale AppSec. This starts with establishing a set of security requirements that involve all the departments, in particular product-related ones. From there it becomes possible to design guardrails to make security truly accessible with a mix of hard and soft gates. By carefully selecting automated detection and remediation that provide visibility and control, you will be laying a solid foundation for a real model of shared responsibility for security. Finally, embedding checks in the CI/CD system can be rolled out in multiple phases to progressively scale your security operations. With automated feedback in place, you can start incrementally adjusting your policies. A centralized platform creates a common interface to facilitate the exchange between application security and developer teams while enforcing processes. It is a huge opportunity to automate and propagate best practices across teams. Developers are empowered to develop faster with more ownership. When security is rethought as a partnership between software-building stakeholders, a flywheel effect can take place: reduced friction leads to better communication and visibility, automating of more best practices, easing the work of each other while improving security with fewer defects. This is how application security will finally be able to scale through continuous improvement.

thomas-segura has 59 posts and counting.See all posts by thomas-segura