Open Source Package Management: Balancing Power and Security

by Dustin Ingram on June 25, 2021

There is a wide ecosystem of open source software, and distributing it has always been a challenge. There is often a central location or index where a publisher or an individual can put their software for others to access it. Finding and consuming it, though, is another matter—where do you look for new software?

In the past, for example, if you were using a language like Python within a certain ecosystem, you’d go to certain websites you knew about and download installation instructions. Today, package management provides a more standardized method of software distribution. Package management implements a process for encapsulating and building, publishing and consuming software in a standard, reproducible way.

Today, most applications are actually made up of someone else’s code. From a reproducibility, testability, reliability and auditability standpoint, you don’t want your dependency tree to be made up of a bunch of software that you’re using without understanding what’s inside.

A great abstraction example in the npm ecosystem is “npm audit,” which is run every time you run “npm install.” With it, you can look at that whole tree, cross reference every single package against a vulnerability database and get notified if there’s an issue. Dependabot can scan that proactively for you, while also automatically opening a pull request. Combine that with GitHub Actions, which can automatically review, land the pull request and push to production, and we go from a world where an app has a whole bunch of components checked into a repo into which the developer has no visibility to a semantic graph managed intelligently and automatically by software.

The advantages of these aforementioned abstractions make things safer for developers as a whole; the more they are honed, the better the package managers get, the more information signal we have and the more we can automate.

So, What’s the Problem?

Both open source and package managers are very popular, and with mass adoption of both, we are starting to see some cracks appear around the issue of trust.

For example, the increasing popularity of open source and open source package management has been a massive boon to the JavaScript ecosystem. It is so simple and welcoming, developers who might never have gotten involved in software distribution can easily get going. There are no worries about certificates, no requirements for a paid account, no meritocratic process to be deemed worthy. Literally, the only thing you need to publish something is the motivation to run two or three commands on the command line.

Now, the downside of this ease of use is that some developers who are publishing things don’t necessarily have the sophistication or the experience to do so in a safe way. Given how quickly the landscape is changing, even those with years of experience do not necessarily know best practices. There is a supply chain risk between the final commit on a release and the package being published to the registry of record. Code can be injected or malware can be introduced; consumers could end up not getting the code they are expecting. There are numerous points of entry for injecting code into the supply chain, and given some of the more extravagant hacks that have happened recently, people have become painfully aware of the issue.

Tools used to consume and distribute software are incredibly powerful and are designed for maximum user freedom. They’re powerful and they get out of your way. They don’t have a lot of safeguards and, in the past, that was good, as it allowed for the ecosystem to be as flexible as possible.

Now, however, we need to build better tooling for consuming and understanding security problems. In the past, developers have built open source software on top of an open source pipeline without really thinking about the need to trust everyone underneath them as part of that process. Now, they are starting to understand the importance of securing the software supply chain.

Registry of Record

Another important concept alongside package management is registry of record. In addition to the package manager, which is the tool that developers use for installation, there are also registries—and the registry of record stores all the code and makes it available. For example, npm (JavaScript) and PyPI (Python) are centralized language ecosystem registries of record that serve as a single point of truth. However, these language ecosystems are not audited by a central authority, which means there’s no quality control.

There are various ways to improve software supply chain reliability via the registry of record. Investing in analysis tools for identifying common traits of security issues is one approach. Registries are only as good as the trust of the people who use them. If people do not trust the registry or its stewards, it’s useless.

The Point of Diminishing Returns

Open source package management is a superpower that allows developers to do a lot of things, but there’s a point of diminishing returns. All that power also comes with a ton of risk. Application development is at an inflection point; developers need to mitigate risk in the development process without removing the magic that makes these tools so useful. The goal is finding that perfect combination of automation, education and infrastructure that allows people to maintain velocity and keep collaborating, but in a way that’s even safer than before. How can we make great developer workflows that will allow us to think of the provenance of code from commit to package in a way that makes guarantees never before possible?

There are no actual solutions, only tradeoffs between making it easier to use and security (and trust). It’s pretty easy to build open source software and its repositories. How can we make these tools a little safer to use, harder to misconfigure without compromising the power of the tools? It’s time to think about how we can make the process more secure, and give end users more confidence in what they’re installing.

Myles Borins co-authored this article.

June 25, 2021June 24, 2021 Dustin Ingram open source, package management, Software Development, supply chain

Dustin Ingram

Dustin Ingram is a Developer Advocate at Google, focused on supporting the Python community on Google Cloud. He's also a director of the Python Software Foundation, and maintainer of PyPI.

dustin-ingram has 1 posts and counting.See all posts by dustin-ingram