As Python developers, we depend on packages (and their dependencies) pulled from the public repositories like the Python Package Index (PyPI) to build our applications. Combined with other third-party code and our own proprietary code, PyPI-sourced components form a critical part of our software supply chain—a chain that is only as strong as its weakest link. To build robust software, we need to ensure all links in the chain are secure and free of vulnerabilities.
Fortunately, the Python Software Foundation recognizes the critical importance of software supply chain security. They’re actively working to be a good steward of the PyPI ecosystem by undertaking several important initiatives, including:
- Adding 2-Factor Authentication (2FA) and API Tokens
- Building a new dependency resolver for pip
- Hiring a packaging product manager
- Creating the Python Packaging Advisory Database
- Integrating The Update Framework into PyPI
- Building pip-audit
All of these initiatives are welcome additions and provide organizations with more ways to help ensure the security and integrity of the Python packages they work with. However, a number of common exploits remain unaddressed, including:
- Typosquatting – the process of bad actors uploading compromised packages named similar to popular, existing packages (ie., urlib3 vs urllib3)
- Malware – bad actors can upload prebuilt binary wheels that contain malicious code
- Dependency Confusion – occurs when developers mistakenly install a compromised package from PyPI that has the same name as that of a custom package internal to their organization.
- Signed Packages – to be fair, none of the open source public repositories offer signing, but it is the only way to ensure packages haven’t been tampered with.
In this article, we’ll discuss the initiatives the Python Software Foundation is undertaking to address the security and integrity of Python packages, as well as some of the issues PyPI has yet to solve. Finally, we’ll offer some tools and suggestions that can help you work around them.
PyPI Security Improvements
PyPI is certainly not the only open source repository subject to attacks by bad actors looking to compromise their user base. Unfortunately, with popularity comes attention, and Python is one of the most popular programming languages out there right now. The result is a growing number of hacks, compromises, and exploits, including:
- Experts found 11 malicious Python packages in the PyPI repository from November 2021
- Software downloaded 30,000 times from PyPI ransacked developers’ machines from July 2021
- Python team fixes bug that allowed takeover of PyPI repository from July 2021
- Dependency confusion attack mounted via PyPi repo exposes flawed package installer behavior from February 2021
The good news about having an active community around a popular language is that issues get discovered, reported and remediated quickly. The bad news is it also means a large number of organizations are affected, and potentially infected, as well.
Let’s explore some of the Python Software Foundation’s efforts to improve PyPI’s security.
PyPI 2FA and API Tokens
The Python Software Foundation recently added support for two-factor authentication (2FA) and API tokens in order to make the process of uploading packages to existing PyPI projects more secure:
- 2FA adds an extra level of validation when package admins sign into the PyPI website. In addition to signing in with a password, users who enable 2FA will either need to provide an access code from an authenticator app or possess a physical security device like a YubiKey.
- Adding API tokens to PyPI helps prevent project takeover because API keys can only be used for uploading new package releases. Even if a PyPI API key is compromised, it can’t be used to sign in to the PyPI website, edit or delete the project, or delete old releases.
Pip’s New Dependency Solver
The Python Software Foundation is in the process of rolling out a new dependency solver for pip, Python’s default package installer, in order to improve dependency management. Some improvements include:
- Strictness – The current version of pip is much stricter, preventing users from installing incompatible and/or insecure packages together in an environment.
- Conflicts – Pip now tells you when a conflict occurs, and provides information to help you manually work around it, where possible.
But with these changes comes new behaviours that may inconvenience users, including:
- Solving can take some time as pip will now backtrack when it finds an incompatibility. This can mean potentially installing/removing multiple packages as pip progressively performs the solve.
- Sometimes pip simply can’t work out the conflict, leaving you to manually come up with your own solution. If you were hoping for a solution to dependency hell, pip is not there yet.
PyPI Packaging Project Manager
The Python Software Foundation has hired Shamika Mohanan as the project manager for PyPI. The project manager’s mandate is to make the service more functional and sustainable.
Shamika also serves as a community manager, receiving feedback from stakeholders in the Python community. Bloomberg, the sponsor of this position, has a focus on “shifting left” which will undoubtedly be a large influence on Shamika’s initiatives.
Python Vulnerability Databases
While the US National Vulnerability Database (NVD) is a repository for open source vulnerabilities in general, there are two databases focused more on Python that record and track vulnerabilities in the Python ecosystem:
- The Python Package Advisory Database is a GitHub repository of vulnerability advisories. Unlike the NVD, which requires vulnerability-reporting organizations to register as a CVE Numbering Authority (CNA), the Python Package Advisory is community owned. Anyone can create a pull request and submit vulnerability updates.
- The Open Source Vulnerabilities (OSV) database from Google tracks vulnerabilities in languages like Python, Go, Rust and others. It provides an API so users can programmatically query whether a package version they’re using is affected by a vulnerability.
These databases can help organizations track vulnerabilities associated with Python dependencies, identify risky packages, and make appropriate choices when building their software.
Automated Client-side Python Updates
The Update Framework (TUF) is the Python implementation of a software update system running on a client system that can automatically connect to PyPI, download and install package updates. TUF also protects against attackers that compromise the repository or signing keys.
Currently in the process of being updated to modern Python, when this software is integrated into PyPI, software developers will be able to incorporate it into their software in order to provide a fast, secure and convenient update mechanism.
pip-audit is a new vulnerability scanner that lets developers quickly scan for vulnerabilities in any package installed in a Python environment. Once vulnerabilities are identified, developers can make an informed decision about whether they want to continue using the package, switch to a different version, or replace the package entirely.
Current Python Supply Chain Challenges
Despite all these efforts, the Python supply chain remains insecure. To be fair, Python is no better or worse when it comes to security than most other open source ecosystems. The problem is that open source is based on public repositories that feature millions of lines of unsigned code uploaded by hundreds of thousands of developers with little to no guarantee of the security or integrity of those software assets.
It’s unfair to think that any non-profit organization run by volunteers would be able to rein in this kind of wild west approach. But that doesn’t mean you can ignore the loopholes in PyPI’s security. Let’s take a look at some of them.
Despite the Python Software Foundation’s best efforts, PyPI still has vulnerabilities that come to light periodically. These vulnerabilities potentially give hackers access to confidential user data, or else let them sabotage systems.
Typosquatting is a social engineering attack that targets users who make typographic errors while accessing content on the internet.
In Python, this issue surfaces when users make mistakes typing an install command. For example, users might type “pip install matplatlib” instead of “pip install matplotlib.” Bad actors have repeatedly uploaded typosquatted packages that contain malware to PyPI in order to take advantage of poor typists.
One way to counter this problem is to use a private repository rather than PyPI that contains only the packages your developers require. In this way, a mistyped install command will just fail, rather than download a compromised package.
In the above scenario involving an internal repository, you should be aware that pip has the unfortunate default behaviour of checking PyPI first and foremost. This is where dependency confusion can occur:
- A bad actor can guess at the name of an internal package (say, shopify_requests) and upload a compromised package of the same name to PyPI. Because pip does not support domain names, when you type “pip install shopify_requests” it can mistakenly install shopify_requests from PyPI rather than your private repository.
One way to combat dependency confusion is to set pip to install only from your local repository.
Python wheels are smaller than source distributions and install faster, but they’re potentially less secure because their contents are opaque to developers. You can’t see what’s going to be executed when you import a binary wheel into your application, making you vulnerable to wheel jacking. Wheel-jacking occurs when developers are tricked into installing a compromised wheel that executes malicious code upon import.
Wheel-jacking can be used in conjunction with dependency confusion whenever bad actors identify/guess at a software package that an organization uses internally. They can then create an identically-named package on PyPI that, once imported, can access any systems and data visible to the application under development.
The best way to avoid wheel-jacking is to always build binary packages from source code.
Conclusions – A More Secure Python Ecosystem
Although the Python Software Foundation has identified security vulnerabilities in PyPI and is constantly working to improve the package repository’s security, some issues will likely always remain. As a volunteer-run organization with limited resources, it’s likely they will always be lagging behind supply chain threats that are continually evolving. After all, a bad actor only needs a single weak link to exploit, but PyPI maintainers need to plug all of them.
Instead of implementing multiple workarounds, best-in-class solutions, and custom code to make up for the security deficiencies in the Python ecosystem, consider using a third-party Python ecosystem, such as that provided by the ActiveState Platform.
The ActiveState Platform implements a turnkey, secure supply chain for Python that includes:
- Import Controls – an open source catalog contains indemnified Python packages, which have been checked to ensure they are well maintained and suitably licensed for commercial use.
- Build Controls – a secure build service automatically builds Python packages (including linked C and Fortran libraries) from source code for Windows, Linux and macOS. Developers no longer need to install potentially compromised binaries.
- Run Controls – checksum verification of all build artifacts throughout each build step ensures that the final built package hasn’t been compromised.
The ActiveState Platform is an easy-to-implement and simple-to-adopt service that can help you address many of the common exploits discussed here, including typosquatting, dependency confusion and wheel-jacking.
- Try the ActiveState Platform for yourself for free.
- Understand how the ActiveState Platform securely builds Python packages from source
- Learn how the ActiveState Platform can automatically remediate vulnerabilities
- View ActiveState’s 2021 report that surveys the state of software supply chain security