How to Solve Reproducible Environments the Easy Way

python reproducible environments
Reproducible environments are critical to ensuring your development, testing and ops teams are collaborating effectively. Even today, it’s far too common for developers to waste their time trying to reproduce a bug only to find out it’s actually related to a misconfigured environment.

Without reproducible environments:

  • Code that works for one user fails to run for another, even though the actual code is identical.
  • Your project becomes extremely brittle, because you‘re never sure when upgrading or installing a new package will break the build. A culture of “Don’t touch it!” prevails.
  • Developers spend more time managing and troubleshooting environments than coding.

If you’ve experienced any of these symptoms, you’re not alone. Complexity arises because:

  • The project is typically being developed by a team of developers, not just a single coder.
  • Your program eventually needs to run in multiple environments–from dev to test to CI/CD to staging and production–some of which will require different configurations.
  • Dependencies can shift over time, making the environment unbuildable when trying to reproduce a bug reported on an older version of the program.

The simplest way to ensure environment reproducibility is to use the exact same set of dependencies pinned to the exact same version number you know will work. However, in practice, compliance requirements, security guidelines, and the need to support multiple deployment environments introduce complexity. 

This post is intended to help you understand the pros and cons of typical reproducibility strategies, and introduce you to the tools that can help simplify the way you create and work with reproducible environments despite complex requirements.

What are Reproducible Environments?

Simply put, a reproducible environment is one in which the components required to run your program have been defined in such a way that others can replicate it without error. Components can include:

Environment Components

  • Operating System – environments can include different libraries depending on the Windows, macOS or Linux operating system the program is intended to run on.
  • System Libraries – some programs may require additional software to be installed on the local system. 
  • Open Source Environment – the runtime environment, which consists of the packages and libraries your program needs in order to run.
  • Virtual Environments – whenever possible, you’ll want to ensure the runtime environment is deployed in a virtual environment to prevent conflicts with existing deployments on the target system.
  • Project Library – the proprietary code for your program.
  • User/System – the target user/system that needs to deploy your project. 

Each of these components must be taken into consideration in order to ensure that reproducibility is addressed at all levels of the project. After all, unless you’re working on a passion project by yourself, reproducibility is a must in order to ensure you can:

  • Safely upgrade packages.
  • Collaborate with other developers using a common source of truth.
  • Allow key constituents (Dev managers, InfoSec, compliance officers, etc) to validate and control the packages in the project.

Strategies for Environment Reproducibility

There are a number of strategies you can use to ensure the implementation of reproducible environments, depending on the needs of your organization. The following strategies range from the simple to the more complex:

  1. Shared Baseline – involves establishing a common set of open source packages that can be used across multiple projects. The implementation of an artifact repository may be sufficient for this strategy.
  2. Snapshot & Restore – involves creating a git-like commit for your environment (including all open source packages and dependencies) so that you can revert to any previous commit at any point in time.
  3. Validatable – allows other constituents to approve or audit the third-party packages and dependencies in the runtime environment. For example, your organization may require periodic security/vulnerability audits, or that the licenses of open source packages be audited to ensure compliance with corporate guidelines.

None of these strategies are exclusive. Rather, you can start with a more simplistic version and layer on additional capabilities as required. But each strategy can also support a range of tactics,  as well, including:

  • Locking Config Files – in Python, for example, a requirements.txt or pipfile.lock configuration file will allow you to pin your dependencies at a specific version, which provides a good starting point for environment reproducibility. 
    • Make sure you also pin transitive dependencies (i.e., dependencies of dependencies) and OS-level dependencies, which have a habit of shifting over time. 
  • Non-Prod vs Prod – typically, Non-Production environments differ from Production environments. For example, Test or CI/CD environments may contain testing frameworks and debuggers that are unnecessary in a Production environment. As a result, you’ll need a tactic to separate your Non-Prod from Prod dependencies while ensuring consistency.
  • Containers – Docker-like containers can provide a repeatable way to create a consistent environment. That said, you still need to ensure the runtime environment you’re building the container with is up to date and configured appropriately for Prod vs Non-Prod environments.

How the ActiveState Platform Simplifies Reproducible Environments

The ActiveState Platform is a cloud-based service that lets you create a single source of truth for your Python, Perl, Ruby and Tcl runtime environments. All of the strategies listed above are supported, as well as the tactics of dependency pinning and inheritable branches for different deployment environments. See how it works:

While the ActiveState Platform primarily focuses on the open source runtime environment, you can also link your project’s GitHub repository, as well. In this way, users can deploy both your code and the required runtime environment with a single command. See how it works.

In this way, you can use a single command to deploy the centralized source of truth appropriate to each of your dev, test, CI/CD and production environments, ensuring reproducibility.

Next steps:

The capabilities shown here are also available as a managed service, freeing up your developers to focus on coding and getting your product to market faster. Learn more about our Managed Builds service.

Contact Sales

Recent Posts

Webinar - Securing Python and Open Source Ecosystems
Securing Python and Open Source Ecosystems

Dustin Ingram, Fellow at the Python Software Foundation (PSF), joins us to discuss trust and security for PyPI and other repositories in light of recent supply chain attacks, and steps being taken to secure the open source ecosystem.

Read More
Scroll to Top