How to Manage Dependencies in Python

One of the biggest headaches that arise when working with multiple projects in Python is managing the dependencies that differ between them. Project A requires Python 2.7 and depends on older versions of a package, while Project B requires Python 3.6 with the most recent version of the same package, plus a host of others. 

What is the easiest way to ensure you can switch between each project without wasting time installing new or different packages? This problem is equally applicable when collaborating with colleagues on the same project: what is the easiest way to manage the project dependencies across multiple machines? 

There are several different approaches to dealing with dependencies in Python. In this article, I’ll go over the pros and cons of some of the more common methods, and suggest a few alternatives.

 

Installing And Managing Packages With Pip

Before I get into the different dependency management solutions, a few comments on installing packages in Python. Anyone that uses Python knows how to install packages. The recommended way to do so is using pip. Depending on how Python was installed on your machine, pip may or may not be already installed. To check, you can run the following in the command line:

pip --version

For further information on how to install pip, see the documentation here. To install a specific package with pip from the Python Package Index, run:

pip install "SomePackage"

Or for a specific package version:

pip install "SomePackage == 1.0"

It’s worth noting that pip is not limited to just installing packages maintained on the Python Package Index. You can also install from a number of different sources. For more information, see the documentation here. It’s also possible to install multiple packages at one time using a requirements.txt file:

pip install -r requirements.txt

In principle, the requirements.txt file is simply a list of pip install arguments executed in order. 

 

Installing And Managing Dependencies Using Virtualenv

Virtualenv is a tool used to create multiple isolated versions of Python. It does so by creating an environment with its own installation directory and corresponding libraries. Within this environment, globally installed libraries are not accessible by default, nor are libraries installed within other virtual environments. 

In this sense, each virtual environment is an isolated platform upon which unique dependencies can be installed. To resolve the problem of multiple projects with different dependencies, all one needs to do is create an environment for each project. Switching between projects is as simple as switching between each virtual environment. For procedures on how to install virtualenv, see the documentation here

The advantages and disadvantages of using virtualenv include: 

Ease of Use: Creating and using a virtualenv is simple and easy. To create an environment from the command line:

virtualenv ENV

ENV is the directory where the new virtual environment will live.

Pip Integration & requirements.txtAs I mentioned, pip is still the standard method to install packages within each virtualenv environment. This makes virtualenv easy to adopt, especially since requirements.txt can be used to make the initialization of each environment relatively quick and straightforward. Using a requirements.txt file can also help ensure dependencies are maintained across multiple environments or machines, although it doesn’t always produce the exact same result. For example:

  • If no version is explicitly specified for each package in the requirements.txt file, pip simply takes the most recent version. 
  • Dependencies are not resolved for you. For example, if a requirements.txt file has two conflicting dependencies (e.g. two different versions of a package are required), the one listed higher in the file takes priority over the one listed lower. This can obviously cause headaches, and will often result in manually having to maintain the correct version of each package (as well as all the sub-dependencies). 

Environment Management: If only a few environments are needed, the virtualenv + requirements.txt file approach works well. If many are needed, creating several different environments with their own dependencies can not only be time consuming, but also take up a lot of space on your machine (and more than likely be redundant). In addition, it becomes increasingly difficult to manage all the requirements.txt files within each environment.

Virtualenv is recommended for Python up to version 3.3. Newer versions of Python should use venv instead. The syntax and functionality are identical, but are now included in the standard Python library.

 

Installing And Managing Dependencies With Pipenv

Pipenv takes dependency management a step further than virtualenv. While the functionality is very similar to virtualenv, a few key features have been added to help limit some of the disadvantages of using virtualenv. The installation instructions can be found here

The advantages and disadvantages of using pipenv include:

Ease of Use: pip and virtualenv have been simplified into a single operation. To create a virtual environment with pipenv:  

pipenv ENV

As before, ENV is the directory where the environment lives. To install a package in the environment, pipenv is used instead of pip:

pipenv install "SomePackage"

Pipenv uses a Pipfile to keep track of dependencies. If a Pipfile exists, it will add “SomePackage” to its list of dependencies. If it does not exist, it will create one. Similar to requirements.txt, you can also create an installation directly from a Pipfile, as well. To do this, do not specify a package, and pipenv will instead look for a Pipfile instead:

pipenv install

Pipfile & pipfile.lock: In addition to the Pipfile, pipenv generates a pipfile.lock that contains the exact version and source file hash of each package installed. This not only includes “SomePackage,” but also all the dependencies of “SomePackage.” 

Pipfile.lock is an alternative to requirements.txt that improves upon the concept by including built-in dependency resolution. Pipfile.lock attempts to resolve any conflicts between packages – including their sub-dependencies – by loading package versions that satisfy all requirements. 

If no solution exists, pipfile.lock cannot be created, and an error is output. This prevents the kinds of problems that arise from using requirements.txt, such as manually performing version control across dependencies and sub- dependencies.  

Dependency Graph: When dealing with complex dependencies, it’s extremely helpful to be able to visualize how they relate. To that end, pipenv includes a method to visualize your dependencies:

pipenv graph

This produces a clean and easy-to-interpret output that lists all dependencies. 

Environment Management: Unfortunately, pipenv doesn’t help with the management of proliferating virtual environments. Nor does it resolve the virtualenv memory requirements that can plague users that need many virtual environments on a single machine.

 

Alternative Python Dependency Management Solutions

Using venv and pipenv are two methods of managing dependencies in Python. They are simple to implement and, for most users, adequate solutions for handling multiple projects with different dependencies. However, they are not the only solutions. Other services can complement their use.  

If you are using venv or pipenv, it may behoove you to use Github as well. Github provides automated vulnerability alerts for dependencies in your repository. By uploading a requirements.txt or pipfile.lock with your code, Github checks for any conflicts, sends an alert to the administrator if it detects any, and can even resolve the vulnerabilities automatically. It also generates a dependency tree under the Insights tab of your repository.    

Another solution for managing dependencies is the ActiveState Platform, which can automatically resolve all the dependencies for your project, and compile them (packages, dependencies and sub-dependencies) into a runtime for the operating system you use. Each time a package or dependency is updated, the ActiveState Platform can rebuild the runtime while ensuring compatibility between all dependencies and sub-dependencies. 

 

Next Steps

  • Sign up for a free ActiveState Platform account and see its dependency management capabilities for yourself by building your own runtime environment.
  • To learn more about dependency management in virtual environments, read Why pipenv > venv blog post.

Dante Sblendorio

Dante Sblendorio

Guest blogger: Dante is a physicist currently pursuing a PhD in Physics at École polytechnique fédérale de Lausanne. He has a Masters in Data Science, and continues to experiment with and find novel applications for machine learning algorithms. He lives in Lausanne, Switzerland.