- Fact: 89% of organizations rely on open source software. (Source: RedHat)
- Fact: 98% of applications incorporate open source software as dependencies in their codebase. (Source: Synopsys)
- Fact: when a single, key open source package is deleted, corrupted or otherwise becomes unusable, it can have a disproportionate impact on the software industry as a whole.
npm ERR! 404 ‘left-pad’ is no longer in the npm registry.
The effect was as if one person had suddenly broken the internet. While the outage was short-lived (npm restored leftpad), it served to expose the house-of-cards precariousness of modern software development.
Unfortunately, the leftpad incident is far from an isolated example. Other key instances include:
- Open-source developer corrupted his own popular libraries – An open source author intentionally corrupted his “colors” and “faker” libraries used by thousands of projects when he decided he was “no longer going to support Fortune 500s (and other smaller sized companies) with my free work.”
- An attack on publisher freedom – npm disputed the name of an open source package and deleted it from their registry when the author refused to rename it.
- Code added to popular NPM package wiped files in Russia and Belarus – One open source author was caught adding malicious code to his popular package that targeted Russian and Belarusian computers as part of a personal protest.
- Compromised npm package – An author who no longer had the time to maintain their popular package unwittingly turned it over to a bad actor, who promptly inserted attack code and republished it to the public repository.
- Popular open source packages hacked to harvest AWS credentials – Public repository author accounts were compromised, allowing bad actors to publish new versions of Python’s eight-year-old “CTX” and PHP’s popular “phpass” packages with malware.
All these cases serve to illustrate the need for BETTER backup and preservation of open source dependencies.
Python’s atomicwrites Package Goes Missing
The latest version of this story occurred last week when the atomicwrites package on the Python Package Index (PyPI) suddenly went missing. Atomicwrites is a Python package that allows atomic file writes, which is useful when you need a file to appear to be consistent even while you’re modifying it. It’s a key dependency for commonly used packages like pytest, home assistant, and many more.
The incident occurred as a direct result of PyPI’s supply chain security initiative, which was in the process of rolling out two-factor authentication (2FA) for authors of what they deemed “critical packages.” Given the daily volume of downloads for atomicwrites, its author was included in the 2FA rollout:
Unfortunately, the author took exception to PyPI’s initiative. They viewed it as an extra time requirement being placed on maintainers of open source libraries that have already dedicated copious amounts of their free time. As a result, the author deleted atomicwrites from PyPI and published a new version of it thinking it would remove the “critical” status. While that worked, it also had the unintended consequence of removing all previously published versions of atomicwrites, which the author was unable to replace/reupload.
The result? In the author’s own words:
“I decided to deprecate this package. While I do regret to have deleted the package and did end up enabling 2FA, I think PyPI’s sudden change in rules and bizarre behavior wrt package deletion doesn’t make it worth my time to maintain Python software of this popularity for free.”
While the folks at PyPI have presently restored all versions of atomicwrites, it serves as yet another example of just how vulnerable to deletions our “shared dependencies” approach to modern coding has become.
A Wayback Machine for Open Source Dependencies
All these incidents serve to highlight the need for better persistence and preservation of open source dependencies. What’s needed is something like a “wayback machine” for public repositories. While every software development organization could take this on individually by implementing dependency vendoring (whereby all the dependencies that a project requires are checked into the organization’s code repository and never deleted), it’s a big ask. Not only does it mean duplicated effort that quickly adds up to a massive time and resources requirement worldwide, but it also introduces the kind of complexity many organizations may not be prepared to deal with. (Read: Everything you need to know about dependency vendoring)
Instead, consider using the ActiveState Platform, which pulls in the source code for open source language dependencies from multiple public repositories, including PyPI (Python), CPAN (Perl), RubyGems (Ruby), Packagist (PHP), and more. And because we never delete any dependency, you can always count on their availability, even if:
- A dependency becomes unusable or goes missing from its public repository
- A transitive or OS-level dependency shifts
This is because the ActiveState Platform not only builds dependencies from source code, but also packages them into self-contained runtime environments for Windows, Mac and Linux operating systems.
In fact, we even retain packages that have been found to contain malware, exploits and other malicious code, which are typically deleted from public repositories as soon as they’re identified. However, this means that malware researchers are often left hunting for instances of deleted packages to perform their forensic analyses. We currently mark compromised packages as “unavailable” to ensure users can’t accidentally include them in their runtime environment builds, but you can still access them using our Malware Archivist tool.
How to Ensure Software Dependency Availability
While it’s unlikely that ActiveState will ever capture all of the world’s open source code, what we do capture has made our customers’ software far more resilient to change in a much more cost-effective way than creating a solution by themselves. And if there’s one thing constant about technology in general and open source in particular, it is change.
Additionally, the ActiveState Platform catalog offers better classification and categorization (based on metadata) compared to a general internet search, making open source packages far more discoverable. This is valuable not only for ISVs, but also scientific researchers who are in the midst of a reproducibility crisis (read more here), in which the results of many scientific studies over the past decade have been found to be difficult or even impossible to reproduce at least partially because the software used to run the experiment is itself unreproducible.
If you’ve ever experienced a temporary “package not found” moment of panic, or even a permanent “can’t reproduce the build” situation, the ActiveState Platform may be the solution you’ve been looking for in order to avoid these occurrences in the future.