C (and less commonly, Fortran) libraries are a necessary evil in Python since they significantly speed up compute-intensive routines, such as math calculations. This makes them indispensable to data analysis and machine learning packages, from TensorFlow to NumPy to Scikit-Learn.
Luckily, most Python packages are “pure Python,” which means they don’t rely on external libraries written in a different language. You can easily install them by simply running pip install <package name>. However, after installing and trying to run a package such as Pillow, you might get an error like:
x86_64-linux-gnu-gcc: error: build/temp.linux-x86_64-3.4/libImaging/Jpeg2KDecode.o: No such file or directory
This is because Pillow is one of those Python packages that requires C libraries to run, and your system is attempting to build the missing C files from source code. But creating a local build environment isn’t usually part of the standard process when installing Python, so what’s going on?
These days, Python developers of packages that incorporate C code typically build a pre-compiled binary wheel for most popular operating systems (OS), and then make the binaries available via the Python Package Index (PyPI). This way, when you run pip install <package name> pip can select the appropriate pre-compiled wheel for your platform and install both the Python code and the required C libraries.
While installing pre-built wheels is a convenient shortcut, they raise two concerns:
- Availability – Unfortunately, not all packages are pre-compiled for all platforms. If your platform is unsupported, you’ll need to build the C libraries yourself.
- Security – PyPI does not provide guarantees as to the security of the packages they offer. With pure Python packages, you at least have the option to read the code and see what they’re doing, but precompiled binaries require other methods to determine whether they contain malware. Use beware.
This blog post is meant to help you understand your options when it comes to managing C libraries associated with your Python deployments, as well as introducing options if you need to build Python packages yourself.
Creating Build Environments for Python
The C and/or Fortran code associated with a Python package is typically distributed in the package’s source code archive, which you’ll need to separately download and install. But you’ll also need to build the source code for your OS before you can use the package, which means you’ll need to create an appropriate build environment on your local system that includes:
- Python development headers
- A C and/or Fortran compiler for your OS
- Build automation tools
- Build scripts for each package to be built
- Installers/packagers for your OS
While this kind of setup is expedient, it has a number of drawbacks. For example:
- The process tends to create one-off, non-repeatable builds, which means it’s extremely difficult to verify that the package was built securely.
- The environment will need to be maintained and updated so you can rebuild the package as new updates are released.
- The package is built only for your OS. If the rest of your team works on different OSes, they’ll need to go through the same process on their machines, leading to repetitive work.
Because of the last point, organizations often centralize the build process in a single team that builds the Python artifacts required by all teams. This method comes with its own set of tasks and requirements, including:
- Creating three different VM or container-based build environments – one for Windows, Mac and Linux – in order to support teams that work on their OS of choice (assuming your build engineers have OS expertise in all three).
- Creating build scripts for both 32-bit and 64-bit builds, as required.
- Implementing a CI/CD process to build and verify the artifacts in a reproducible way to ensure secure builds.
If all this sounds like a lot of work to set up and maintain, and then rebuild the packages required for each project over time as vulnerabilities are found and new versions are released, you’d be correct. Luckily, there’s a better way.
Automatically Build Python from Source Code
The ActiveState Platform provides a cloud-based secure build service that automatically builds Python packages from source code, including any linked C and Fortran libraries. As such, the ActiveState Platform:
- Maintains its own catalog of source code for hundreds of thousands of Python packages, and updates the catalog every 24-48 hours.
- Maintains its own catalog of build scripts, so it understands how to build each package along with its dependencies, and in which order.
- Automatically spins up a container-based build environment that has all the necessary components to build the package for the target OS, be it Windows, Mac or Linux, or all three.
- Automatically packages the resulting runtime environment for the target OSes.
- Optionally makes the built Python wheels available via the ActiveState artifact repository from where developers can “pip install” them.
And it does it all in a secure, repeatable manner so you can be sure that the artifacts it produces are trustworthy. There’s no need for you to have OS expertise, or even Python experience to be able to securely build all the packages your teams require from source.
See how it works:
The capabilities shown here are also available as a managed service, freeing up your developers to focus on coding and getting your product to market faster. Learn more about our Managed Builds service.
- Read our blog post: How to Build Perl without a Compiler
- Read our Data Sheet: ActiveState Platform’s Secure Build Service
- Read our Data Sheet: ActiveState Platform’s Artifact Repository