Recent investigations into the Python ecosystem by Checkmarx have uncovered a long-standing design issue that can result in the execution of malicious code both when running pip install as well as when running pip download.
The problem lies with the way that setup.py was originally designed. Setup.py is commonly bundled in Python packages as the setup script, and is meant to provide metadata such as a list of all the package’s requirements/dependencies. Unfortunately, it’s also where bad actors implement their malicious code. In this case, simply running pip install is sufficient to compromise your machine should the malicious code also install malware along with your package.
More worrying, though, is the fact that simply running pip download is sufficient to invoke setup.py, which can also result in executing malicious code. Most users will be surprised to learn that simply downloading a package results in executing code, but this is by design since, in order to work with the package, all requirements/dependencies will also need to be downloaded. Again, the list of dependencies to be downloaded is obtained by running setup.py.
The good news is that most packages are installed as wheels (.whl), which is the default format for packages that pip looks for first. Wheels don’t require the execution of setup.py. Unfortunately, wheels are not a panacea, because:
- Not all packages are available as wheels.
- Wheels are neither signed, nor necessarily securely built, and may contain malware themselves. While the Python Package Index (PyPI) does scan for malware, it’s not always effective at finding it.
Securing the Software Supply Chain
The Python ecosystem is not alone in being susceptible to these kinds of supply chain attacks. Most public repositories do not offer signed code, nor do they provide any guarantees as to the security and integrity of the third-party packages they offer.
This leaves it up to each and every security-conscious organization to implement their own set of checks and balances to ensure that the third-party code they import, build and work with won’t compromise their systems and software. But all those controls can be extremely costly both to implement as well as maintain, upgrade and audit over time.
This is a problem that ActiveState has been addressing with our ActiveState Platform. The ActiveState Platform maintains its own catalog of open source assets, and uses them to automatically build open source packages from source code (including linked C libraries) in a secure manner. As such, it offers developers an automated, turnkey solution that can help secure your Python, Perl, Ruby and Tcl supply chain by providing:
- Secure Import Process: our automated system regularly checks upstream open source repositories for any newly released or new version of a package. If found, it will:
- Download both the source code and associated metadata.
- Perform static analysis on the source code to ensure our data about it is complete before inserting it into our catalog.
- Check popular vulnerability databases so we can augment our metadata with CVE information.
- Regularly monitor the source code we ingest, and remove it from the catalog/pipeline if it is found to contain a trojan, malware, etc.
- Secure Build Service: our automated build service runs in the cloud on a minimal set of predefined, locked down resources in order to minimize the attack surface. The service incorporates:
- Build scripts that cannot be accessed and modified within the build service, preventing exploits.
- Ephemeral, isolated build steps that execute in their own containers, which are discarded at the completion of each step. In other words, containers are purpose-built to perform a single function, reducing the potential for compromise.
- Hermetically sealed environments that have no internet access, preventing (for example) dynamic packages from including remote resources.
- Reproducible builds where the build process fails safe, terminating the build if any component generated during a build step fails its checksum verification.
With respect to the specific threat posed by setup.py, the ActiveState Platform provides protection by running setup.py in an isolated ephemeral build step so it never gets run when you install it on your machine using ActiveState’s package manager, the State Tool.