How to Mitigate the 3 Most Common Python Supply Chain Threats

Blog

How to Mitigate the 3 Most Common Python Supply Chain Threats

Python, like JavaScript, is one of the most popular programming languages on the planet. And just like JavaScript is used in 98% of all global websites, Python is at the heart of today’s AI and Machine Learning explosion being driven by ChatGPT, Stable Diffusion, and others. As such, it’s no wonder that Python has been a prime target for bad actors looking to exploit weak links in vendors’ supply chains in order to compromise their software.

For example:

August 2022 – Python’s “secretslib” package was found to covertly hijack Linux machines in order to run cryptominers.
October 2022 – dozens of Python packages, any of which once imported into a development environment, were found to be stealing data using W4SP.
December 2022 – a PyTorch build was subject to a dependency confusion attack when it pulled in the compromised PyPI version of its torchtriton dependency rather than the local version.
March 2023 – when setup.py was run during the installation of dozens of Python packages, they connected to an external URL in order to download malicious code.

Because supply chain attacks can target any stage of the software development lifecycle, finding (and fixing) weak links can seem like a never ending task. This blog will show you how to address three of the most commonly exploited vectors of supply chain attack:

On Import – malicious exploits present in Python code imported into the organization
On Install – setup.py can be hijacked to install malicious code
On Build – non-local resources can compromise locally built Python artifacts

Importing Python Securely

Python code is generally imported into an organization in one of two ways:

As source code
As a prebuilt package

In order to ensure security, the latter method should always be avoided, especially if the prebuilt package contains binary code, which makes it impossible to even just view the code to make sure it’s clean.

Following best practices, source code should be staged in a repository of some kind, and scanned to identify malicious code. As an additional precaution, consider quarantining any newly released code that generates a warning. The Python Package Index (PyPI) is extremely efficient at removing compromised packages in a timely manner, which means you may be able to rely on PyPI to do your work for you.

If you lack a repository for staging, you could do worse than using a GitHub repository and their automated scanning capabilities. Alternatively, the ActiveState Platform incorporates an ingestion pipeline that ensures all Python source code is vetted for security and integrity before being considered for inclusion in our catalog or our quarantine zone for subsequent follow up.

Installing Python Securely

As noted above, setup.py is a common vector of attack for Python packages since it can automatically execute malicious code, or else attempt to include remote malicious resources during package installation. In other words, simply running pip install or pip download can be enough to compromise your system.

Luckily, pip defaults to installing wheels, which don’t require setup.py. The threat arises when no wheel is available for the target operating system. Of course, prebuilt wheels pose their own problems, as discussed in the previous section.

To ensure Python is installed securely, you’ll need to build all of your dependencies from source code in an environment that has no connection to a public network. But this kind of dependency vendoring can be complex and expensive in terms of time and resources, especially for smaller organizations.

Alternatively, you can use the ActiveState Platform to automatically build your Python dependencies from source code. During the builds process, setup.py is run in a hermetically sealed container, which has no access to an external network. Installation on your local machine is done using our CLI tool, the State Tool, which does not run setup.py. In this way, you can completely avoid setup.py threat vectors.

Building Python Securely

Building Python securely means ensuring (at least) two key things:

Vetted Source Code – builds can only be as secure as the code that goes into them.
Reproducibility – if the same “bits” input don’t always result in the same “bits” output, there’s no guarantee the artifacts you’re working with haven’t changed from build to build.

Unfortunately, reproducibility is rarely implemented due to the complexity associated with creating deterministic builds. For example, ActiveState’s State of Supply Chain Security survey of more than 1500 organizations big and small across the globe showed that only ~22% of respondents could claim build reproducibility.

To create reproducible builds you’ll need to:

Ensure all code required for a build has been vetted and is present locally to avoid the threat of dependency confusion.
Ensure all build environments are ephemeral, isolated and hermetically sealed to avoid inclusion of potentially malicious remote resources.
Discard the build (and all interim artifacts) if the hash of any artifact does not match the expected result.

Alternatively, the ActiveState Platform always creates reproducible builds, or else fails safe if it discovers that build integrity has been compromised.

Conclusions – Mitigating Python Supply Chain Vulnerabilities

Securing the Python software supply chain is more possible than ever since new tools (like SBOMs and Software Attestations), as well as services like the ActiveState Platform can help organizations ensure the integrity and security of the Python code they import, build and install.

But most organizations continue to be focused on Python vulnerabilities, which is arguably the strongest link in the Python supply chain since, according to Snyk, 87% of Python vulnerabilities have a known fix.

Additionally, according to the latest Veracode State of Open Source Software Report: