Published: December 9, 2021Last Updated: December 20, 2021

PyPI security pitfalls and steps towards a secure Python ecosystem

As Python developers, we depend on packages (and their dependencies) pulled from the public repositories like the Python Package Index (PyPI) to build our applications. Combined with other third-party code and our own proprietary code, PyPI-sourced components form a critical part of our software supply chaina chain that is only as strong as its weakest link. To build robust software, we need to ensure all links in the chain are secure and free of vulnerabilities. 

Fortunately, the Python Software Foundation recognizes the critical importance of software supply chain security. They’re actively working to be a good steward of the PyPI ecosystem by undertaking several important initiatives, including:

All of these initiatives are welcome additions and provide organizations with more ways to help ensure the security and integrity of the Python packages they work with. However, a number of common exploits remain unaddressed, including:

  • Typosquatting – the process of bad actors uploading compromised packages named similar to popular, existing packages (ie., urlib3 vs urllib3)
  • Malware – bad actors can upload prebuilt binary wheels that contain malicious code
  • Dependency Confusion – occurs when developers mistakenly install a compromised package from PyPI that has the same name as that of a custom package internal to their organization.
  • Signed Packages – to be fair, none of the open source public repositories offer signing, but it is the only way to ensure packages haven’t been tampered with.

In this article, we’ll discuss the initiatives the Python Software Foundation is undertaking to address the security and integrity of Python packages, as well as some of the issues PyPI has yet to solve. Finally, we’ll offer some tools and suggestions that can help you work around them.

PyPI Security Improvements

PyPI is certainly not the only open source repository subject to attacks by bad actors looking to compromise their user base. Unfortunately, with popularity comes attention, and Python is one of the most popular programming languages out there right now. The result is a growing number of hacks, compromises, and exploits, including:

The good news about having an active community around a popular language is that issues get discovered, reported and remediated quickly. The bad news is it also means a large number of organizations are affected, and potentially infected, as well.

Let’s explore some of the Python Software Foundation’s efforts to improve PyPI’s security.

PyPI 2FA and API Tokens

The Python Software Foundation recently added support for two-factor authentication (2FA) and API tokens in order to make the process of uploading packages to existing PyPI projects more secure:

  • 2FA adds an extra level of validation when package admins sign into the PyPI website. In addition to signing in with a password, users who enable 2FA will either need to provide an access code from an authenticator app or possess a physical security device like a YubiKey.
  • Adding API tokens to PyPI helps prevent project takeover because API keys can only be used for uploading new package releases. Even if a PyPI API key is compromised, it can’t be used to sign in to the PyPI website, edit or delete the project, or delete old releases. 

Pip’s New Dependency Solver

The Python Software Foundation is in the process of rolling out a new dependency solver for pip, Python’s default package installer, in order to improve dependency management. Some improvements include: 

  • Strictness – The current version of pip is much stricter, preventing users from installing incompatible and/or insecure packages together in an environment. 
  • Conflicts – Pip now tells you when a conflict occurs, and provides information to help you manually work around it, where possible.

But with these changes comes new behaviours that may inconvenience users, including:

  • Solving can take some time as pip will now backtrack when it finds an incompatibility. This can mean potentially installing/removing multiple packages as pip progressively performs the solve.
  • Sometimes pip simply can’t work out the conflict, leaving you to manually come up with your own solution. If you were hoping for a solution to dependency hell, pip is not there yet.

PyPI Packaging Project Manager

The Python Software Foundation has hired Shamika Mohanan as the project manager for PyPI. The project manager’s mandate is to make the service more functional and sustainable.

Shamika also serves as a community manager, receiving feedback from stakeholders in the Python community. Bloomberg, the sponsor of this position, has a focus on “shifting left”  which will undoubtedly be a large influence on Shamika’s initiatives.

Python Vulnerability Databases

While the US National Vulnerability Database (NVD) is a repository for open source vulnerabilities in general, there are two databases focused more on Python that record and track vulnerabilities in the Python ecosystem:

  • The Python Package Advisory Database is a GitHub repository of vulnerability advisories. Unlike the NVD, which requires vulnerability-reporting organizations to register as a CVE Numbering Authority (CNA), the Python Package Advisory is community owned. Anyone can create a pull request and submit vulnerability updates. 
  • The Open Source Vulnerabilities (OSV) database from Google tracks vulnerabilities in languages like Python, Go, Rust and others. It provides an API so users can programmatically query whether a package version they’re using is affected by a vulnerability. 

These databases can help organizations track vulnerabilities associated with Python dependencies, identify risky packages, and make appropriate choices when building their software.

Automated Client-side Python Updates

The Update Framework (TUF) is the Python implementation of a software update system running on a client system that can automatically connect to PyPI, download and install package updates. TUF also protects against attackers that compromise the repository or signing keys. 

Currently in the process of being updated to modern Python, when this software is integrated into PyPI, software developers will be able to incorporate it into their software in order to provide a fast, secure and convenient update mechanism.

Python Pip-audit

pip-audit is a new vulnerability scanner that lets developers quickly scan for vulnerabilities in any package installed in a Python environment. Once vulnerabilities are identified, developers can make an informed decision about whether they want to continue using the package, switch to a different version, or replace the package entirely. 

pip-audit can help organizations shift security left by raising awareness of vulnerabilities in dev environments, and providing options that can help developers quickly remediate them. If you’ve ever used npm-audit on a JavaScript project, then you already have a good idea of what pip-audit provides for Python projects.

Current Python Supply Chain Challenges

Despite all these efforts, the Python supply chain remains insecure. To be fair, Python is no better or worse when it comes to security than most other open source ecosystems. The problem is that open source is based on public repositories that feature millions of lines of unsigned code uploaded by hundreds of thousands of developers with little to no guarantee of the security or integrity of those software assets. 

It’s unfair to think that any non-profit organization run by volunteers would be able to rein in this kind of wild west approach. But that doesn’t mean you can ignore the loopholes in PyPI’s security. Let’s take a look at some of them.

PyPI Vulnerabilities

Despite the Python Software Foundation’s best efforts, PyPI still has vulnerabilities that come to light periodically. These vulnerabilities potentially give hackers access to confidential user data, or else let them sabotage systems.

Typosquatting

Typosquatting is a social engineering attack that targets users who make typographic errors while accessing content on the internet.

In Python, this issue surfaces when users make mistakes typing an install command. For example, users might type “pip install matplatlib” instead of “pip install matplotlib.” Bad actors have repeatedly uploaded typosquatted packages that contain malware to PyPI in order to take advantage of poor typists. 

One way to counter this problem is to use a private repository rather than PyPI that contains only the packages your developers require. In this way, a mistyped install command will just fail, rather than download a compromised package.  

Dependency Confusion

In the above scenario involving an internal repository, you should be aware that pip has the unfortunate default behaviour of checking PyPI first and foremost. This is where dependency confusion can occur:

  • A bad actor can guess at the name of an internal package (say, shopify_requests) and upload a compromised package of the same name to PyPI. Because pip does not support domain names, when you type “pip install shopify_requests” it can mistakenly install shopify_requests from PyPI rather than your private repository.

One way to combat dependency confusion is to set pip to install only from your local repository.

Python Wheel-Jacking

Python wheels are smaller than source distributions and install faster, but they’re potentially less secure because their contents are opaque to developers. You can’t see what’s going to be executed when you import a binary wheel into your application, making you vulnerable to wheel jacking. Wheel-jacking occurs when developers are tricked into installing a compromised wheel that executes malicious code upon import.

Wheel-jacking can be used in conjunction with dependency confusion whenever bad actors identify/guess at a software package that an organization uses internally. They can then create an identically-named package on PyPI that, once imported, can access any systems and data visible to the application under development.

The best way to avoid wheel-jacking is to always build binary packages from source code.

Conclusions – A More Secure Python Ecosystem

Although the Python Software Foundation has identified security vulnerabilities in PyPI and is constantly working to improve the package repository’s security, some issues will likely always remain. As a volunteer-run organization with limited resources, it’s likely they will always be lagging behind supply chain threats that are continually evolving. After all, a bad actor only needs a single weak link to exploit, but PyPI maintainers need to plug all of them.

Instead of implementing multiple workarounds, best-in-class solutions, and custom code to make up for the security deficiencies in the Python ecosystem, consider using a third-party Python ecosystem, such as that provided by the ActiveState Platform.

The ActiveState Platform implements a turnkey, secure supply chain for Python that includes:

  • Import Controls – an open source catalog contains indemnified Python packages, which have been checked to ensure they are well maintained and suitably licensed for commercial use.
  • Build Controls – a secure build service automatically builds Python packages (including linked C and Fortran libraries) from source code for Windows, Linux and macOS. Developers no longer need to install potentially compromised binaries.
  • Run Controls – checksum verification of all build artifacts throughout each build step ensures that the final built package hasn’t been compromised. 

The ActiveState Platform is an easy-to-implement and simple-to-adopt service that can help you address many of the common exploits discussed here, including typosquatting, dependency confusion and wheel-jacking. 

Next Steps:

Recommended Reads

European union’s supply chain security guidelines for software suppliers

Software Supply Chain Security Checklist for Enterprises

Data Sheet: Shifting Security Left with the ActiveState Platform

Dana Crane

Dana Crane

Experienced Product Marketer and Product Manager with a demonstrated history of success in the computer software industry. Strong skills in Product Lifecycle Management, Pragmatic Marketing methods, Enterprise Software, Software as a Service (SaaS), Agile Methodologies, Customer Relationship Management (CRM), and Go-to-market Strategy.