Dependency Resolution Optimization – ActiveState’s Approach

Dependencies. They’re the best, but also the worst. You can count on PyPI (the Python Package Index) or CPAN (the Comprehensive Perl Archive Network) to provide packages for so many things, from date and time handling to logging to web service clients. This can be a huge time saver, but dependency resolution can also be a huge time sink.

Sometimes you try to install package A, only to find out its dependency tree includes package B, which you’re already using, but at a higher major version. Does the tool you’re using warn you about this? Some tools just upgrade package B automatically. Many ecosystems have tools for pinning package versions and only doing controlled upgrades, but those tools have their own downsides.

Wouldn’t it be nice to have a user-friendly system for managing your dependencies? It would be great if you could add a new package and see what your dependency tree would look like once it’s added, before committing to it. It’d be even better if that dependency management system tracked conflicts between packages, knew about platform-specific dependencies, and even tracked system-level dependencies like C and C++ libraries.

Dependency resolution is at the core of the ActiveState Platform. When you create a project and start adding requirements, we tell you what dependencies those requirements have. Sometimes we need to tell you that your requirements are impossible because of dependency conflicts.

Every language ecosystem needs dependency management. Perl has one of the oldest dependency management toolchains around, starting with the creation of CPAN in 1995. Python followed soon after with PyPI in 2000. 

Dependency resolution for these ecosystems is done with package management tools that you run locally. Historically, these tools operated on a system-wide language installation, though now we have tools like perlbrew and virtualenv to allow per-project language installations.

But these tools simply punt on the handling of C/C++ libraries and external tools. When a package requires a library like OpenSSL or a tool such as a Fortran compiler, the package author has to write code that runs during the setup phase (via a Makefile.PL or setup.py). If the dependency isn’t present, then the install fails. If you’re lucky, you get a helpful message telling you what’s missing, at which point you’re expected to install the library or tool yourself. If you’re on Linux you can usually rely on your OS vendor’s package manager. On macOS you have Homebrew. And on Windows you have pain.

There is a similar problem with various types of conditional dependencies, such as may occur when a dependency is only needed on a certain platform or CPU architecture. Language ecosystems typically provide limited support for these sorts of things at best, and again it’s up to the package author to manually add the appropriate checks and error messages.

The ActiveState Platform aims to handle every dependency for every language. That means handling libraries down to the C/C++ level, external tools, and all the conditional dependencies that exist. To take things even further, our ultimate goal is to support multi-language projects. That means that you can create a project using both Python and Perl packages, and we’ll make sure that both languages are using the same (up to date) OpenSSL version.

Dependency Metadata Concepts

We store all of the metadata about the packages/modules/libraries, etc – everything you can build a language runtime with – in our Inventory Database. 

Any inventory item can have dependencies. For example, version 3.8.2 of the Python core depends on OpenSSL and several other C libraries, while the Python requests inventory item depends on urllib3, the Python core, and several other Python libraries.

However, rather than having inventory items depend directly on other items, they depend on “features”. Each inventory item, along with other things like operating systems or CPU architectures provide one or more features. A feature consists of:

  • Name
  • Namespace
  • Version

For example, Python 3.8.2 provides a feature named python in the language namespace at version 3.8.2. In Perl, each distribution provides features for all of the Perl packages contained in the distribution. So version 1.52 of the Perl DateTime inventory item provides a number of features in the language/perl namespace, including DateTime 1.52, DateTime::Duration 1.52, and others.

This added layer of indirection gives us a lot of flexibility in how we represent and resolve dependencies. For example, in the future we may have multiple inventory items that provide a feature like “python-core”, allowing us to build projects using CPython, Jython, or PyPy with Python packages that do not need the CPython API. Or we could have a feature named “openssl-api” that both OpenSSL and LibreSSL provide, allowing you to swap between them in a project.

We’d like to feel clever about this idea, but we just stole it straight from Debian’s virtual package system.

The PubGrub Algorithm for Dependency Resolution

Our first dependency solver (which we call version 0, or “V0”) was very, shall we say, “ad hoc”. While it worked, more or less, it wasn’t nearly as efficient or predictable as we’d want. Sometimes it would fail in surprising ways, and its errors messages could be quite rambly. In addition, it didn’t support all the things we wanted, like some complex conditional dependency cases. But it got the job done well enough for us to start getting some traction, sometimes with heroic efforts to tweak our data to work around the solver’s quirks or missing features.

It was obvious that we needed something better, but designing that “something better” from scratch would be a huge amount of work. Fortunately, we didn’t have to.

Natalie Weizenbaum, while working on the library tooling for the Dart language, created a SAT Solver algorithm she called PubGrub. Natalie has written a great introductory article on PubGrub, which we highly recommend. There is also a detailed technical specification in the pub repository.

PubGrub has a few properties that make it an excellent choice for ActiveState:

  • Adaptable – We were easily able to extend it to work with our Inventory Item -> Feature -> Feature Provider system because the core of its implementation is abstract enough that we can have dependencies from an Inventory Item to a Feature, rather than directly between Inventory Items.
  • Efficient – When PubGrub finds a conflict, it determines the root cause, even if the cause is earlier in the dependency chain. The algorithm “remembers” this and will not attempt to include the dependency at the root of the chain again.

    This is important for us because we often have a very large set of requirements to solve. For example, our ActivePython CE distributions require a few hundred Python packages along with the Python core. When the dependencies are fully resolved this ends up pulling in over 400 packages.
  • Concise – PubGrub makes it easy to generate better error messages when solving fails. Whenever it detects a conflict, it’s able to trace that conflict back to its root cause. The error message we get is the most minimal and clear explanation of the conflict that can be provided.

Understanding Dependency Resolution Errors

Here’s an example error from our shiny new V1 solver:

Because Feature|language/perl|Test2-Harness (0.001099) requires 
Item|language/perl|Test2-Harness (0.001099) which depends on 
Feature|language/perl|Test2::Bundle::Extended (>=0.000126), 
Feature|language/perl|Test2-Harness (0.001099) requires 
Feature|language/perl|Test2::Bundle::Extended (>=0.000126). 
So, because no versions of 
Feature|language/perl|Test2::Bundle::Extended match >=0.000126 
and root depends on 
Feature|language/perl|Test2-Harness (0.001099), 
version solving failed.

That’s still a mouthful, and we have plans for making it even easier to understand, but let’s break this one down piece by piece:

Because Feature|language/perl|Test2-Harness (0.001099) requires Inventory|language/perl|Test2-Harness (0.001099) ...

The reference to “Feature|language/perl|Test2-Harness” comes from our order requirements, which asked for a Feature in the language/perl namespace named Test2-Harness at exactly version 0.001099. So this is where the conflict started. In turn, this Feature requires an Inventory Item that provides this feature. That’s the second half of the message.

Next we have:

… which depends on Feature|language/perl|Test2::Bundle::Extended (>=0.000126) ...

This is telling us that version 0.001099 of Test2-Harness requires a Feature named Test2::Bundle::Extended in the language/perl namespace at any version greater than or equal to 0.000126.

Now we have:

… Feature|language/perl|Test2-Harness (0.001099) requires Feature|language/perl|Test2::Bundle::Extended (>=0.000126).

Because our requirements asked for a Feature in the language/perl|Test2-Harness == 0.001099, we also require language/perl|Test2::Bundle::Extended >=0.000126. So the solver was able to figure out how one requirement implies another.

Wrapping up:

So, because no versions of Feature|language/perl|Test2::Bundle::Extended match >=0.000126 ...

This is simple. It’s telling us that there are no providers of the language/perl|Test2::Bundle::Extended Feature at a version >=0.000126 in our ActiveState Platform catalog. This is correct, as we only have versions 0.000120 and 0.000097 of the Test2::Bundle::Extended Feature at present.

Finally:

... and root depends on Feature|language/perl|Test2-Harness (0.001099), version solving failed.

The Solver concludes  that “root” depends on language/perl|Test2-Harness == 0.001099. The word “root” is simply what the PubGrub algorithm uses for an artificial “first requirement” which depends on all the real requirements in the project. We plan to change this to say something a little friendlier like “your project” or “the project’s requirements” in the future.

Overall, while the error message is still somewhat verbose, it’s as concise as it can be and makes it easy for us to help our users resolve these sorts of problems. By contrast, here’s the old solver’s error message:

No inventory item version satisfies runtime dependency language/perl Test2::Bundle::Extended >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Plugin::MemUsage >= 0.002002 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Plugin::UUID >= 0.002001 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Require::Module >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Tools::AsyncSubtest >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Tools::Subtest >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::Util::Term >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl Test2::V0 >= 0.000126 (of requirement language/perl Test2-Harness == 0.001099)
No inventory item version satisfies runtime dependency language/perl goto::file >= 0.005 (of requirement language/perl Test2-Harness == 0.001099)

On the plus side, it tells us about every missing dependency. On the minus side, it’s quite long. And this is actually the best-case scenario. In some cases, we will see the same error repeated over and over, and the errors become incredibly hard to understand.

As an aside, this particular error could have several possible causes:

  1. We have not imported a recent enough release of the Test2-Suite Perl distribution, and none of the versions we have provide the feature that’s needed.
  2. There was a bug in how we imported the dependencies for Test2-Harness.
  3. There was a bug in how we imported the provided features for something that should provide the Test2::Bundle::Extended feature.
  4. There was a bug in the metadata the author provided in the distro uploaded to CPAN.

(spoiler: it’s #1).

Cool Things ActiveState’s New Dependency Solver Can Do

Our new solver also does a lot of other things that are core to our product vision. It supports:

  • Dependencies on things besides language libraries. So for example a particular language package can depend on a specific OS kernel or CPU architecture. 
  • Conflict-style dependencies. So we can say that a given language package works on any OS except Windows, or any libc except musl.  We can also resolve conflicts between two language packages.
  • Conditional dependencies, where the dependency can be on another package (“only include this if the Python core is <= 2.7”), the build platform’s kernel (“only include this if the Linux kernel is >= 4.4.0”), the CPU architecture, and more.

These sorts of features are key to realizing our vision of truly comprehensive dependency management on the ActiveState Platform. While we’ve had the database design to store this data for quite some time, the V0 solver has mostly ignored that data.

We’re really excited about our new and improved solver, and are looking forward to having our users try it when we roll it out in Q3 of this year.

Dependency Management in Action

If you want to get a hands-on appreciation for how the ActiveState Platform currently tries to resolve all Perl language dependencies for you – down to the C library, external tooling and OS/CPU dependencies – you can:

  • Fork a version of our ActivePerl 5.28 (requires a free account). Once forked, you can just scroll down to see the list of dependencies automatically pulled in for the 213 modules in ActivePerl.
  • To see dependency resolution in action:
    1. Click the Add Packages button and search for, then add the Test2-Harness module.
    2. Click Done and scroll down to see the dependency conflicts.

Feel free to add other modules and see how the Platform imports its dependencies for you. Can’t find a module you need? Let us know in the ActiveState Community forum.

 

Related Blogs:

How to Best Manage Python Dependencies

Managing Dependencies & Runtime Security

 

 

Dave Rolsky

Dave Rolsky

Dave Rolsky has been a Perl developer since 1999, and has created or contributed to dozens of CPAN modules, including DateTime, Log::Dispatch, Params::Validate, and more. He is also a member of the Moose core development team, and in early 2009 completed a TPF grant to substantially rewrite and expand the Moose documentation. Way back when, he co-wrote Embedding Perl in HTML with Mason and RT Essentials, both published by O'Reilly. He spends a lot of his free time on animal advocacy, and otherwise vegetates with video games, books, and TV shows, like any proper nerd.

Comments 2