Reproducibility: How to Ensure Your Code Works on Every Machine

reproducibility problem in python
Reproducibility starts with consistent, reproducible builds, which leads to consistent, reproducible environments. When reproducibility is assured, the result is fewer issues slowing down software delivery and deployment.

That’s the concept at the heart of the ActiveState Platform, which provides:

  • Consistent, reproducible builds of Python, Perl and Tcl runtime environments from source code
  • A single, central “source of truth” for all open source runtime environments across development, test and production. 

That means:

  • All developers use the same development environment, built from verifiable source code.
  • All servers in the CI/CD pipeline pull the same runtime environment as that being used by developers.
  • The production environment is also based on the same runtime environment (usually minus the test harnesses).

The result is consistent, reproducible environments that eliminate “works on my machine” issues, making every coder’s job (from developers to QA to support) easier.

Build Reproducibility

These days, we can’t really discuss build reproducibility without accounting for the integrity of the open source supply chain artifacts in the build. Take the source control systems, build platforms and package repositories involved in our build processes, for example. All of them are potential points of compromise where bad actors can introduce compromised commits and artifacts,  and even trusted developers can unwittingly add typosquatted packages. 

A reproducible build is generally defined as “builds with the same inputs result in bit-for-bit identical output.” But if even one of your supply chain artifacts has been compromised, your build is no longer reproducible, and you may not even know it.

One solution that has been proposed by Google is Supply-chain Levels for Software Artifacts (SLSA, pronounced salsa), which is “an end-to-end framework for ensuring the integrity of software artifacts throughout the software supply chain.” It imposes a number of requirements on the source, build process and provenance of supply chain artifacts. The goal (at the highest level) is to:

  • Not only ensure secure, reproducible builds BUT
  • Create builds that are verifiably reproducible. In other words, builds where it’s possible to determine the binary provenance of an artifact—information such as what sources it was built from—in a trustworthy manner.

When you use the ActiveState Platform to build your Python, Perl and Tcl environments, you get secure, reproducible runtime environments every time. While we can’t claim “verifiably reproducible” at the moment (we’re still in Beta), the ActiveState Platform currently provides:

  • Scripted, hermetic builds that take place in ephemeral, isolated environments
  • Source code provenance that is non-falsifiable and complete in terms of identifying all dependencies (including transitive dependencies), the build entity, source and other metadata

The result is a consistent, reproducible runtime environment (a version of the programming language plus all the dependencies required to run the application), which goes a long way to solving the problem of consistent environments. 

Solving Environment Configuration Drift

All projects start with a standard set of open source dependencies–a standard configuration–that every developer on the team uses to build their development environment. But what quickly happens has been termed “configuration drift” in which different developers install various patches, packages or versions of packages (along with their different dependencies) to try and solve issues that arise during the coding process.

Files designed to control configuration drift, such as the ReadMe, CPANfile (for Perl), Requirements.txt (for Python), etc., sooner or later all become out of date since they are subject to manual updating, sharing and implementation. Even those organizations that standardize on containers still need to ensure that their containers are rebuilt with the latest version of the runtime environment and re-implemented to ensure consistency. When these processes are not strictly adhered to, the result can often be a proliferation of bugs.

According to a study by Stripe, developers can spend over 17 hours per week debugging, refactoring and fixing bad code, all of which adds up to lost productivity on the order of $300B per year. The majority of that time is spent reproducing environments in which bugs have been found. 

The ActiveState Platform provides organizations with a central configuration management console for Python, Perl and Tcl runtime environment configuration:

  • One or more developers on a team can create and update the configuration for their project.
  • Configurations are automatically resolved for dependencies, including: 
    • Transitive dependencies (ie., dependencies of dependencies)
    • Operating system-level dependencies
    • Shared dependencies (such as OpenSSL)
    • Dependency conflicts are not only flagged, but a manual solution is also offered
  • Configurations can be branched, merged, saved and restored at any time.

How ActiveState Fits in the SDLC

As a result, organizations can:

  • Create a single, central source of truth for the configuration of their runtime environment configuration.
  • Use the same configuration across all environments, or else create branches for development, test, UAT, pre-production, production, etc., that programmatically inherit any changes made to the parent.
  • A single command can be used to programmatically update each system to the latest configuration.

Adopting the ActiveState Platform can significantly reduce configuration drift in your organization.

Conclusions: Use the ActiveState Platform to build environments that work on ‘all’ machines

No coder wants to worry about reproducibility. Software should “just work” so we can all get on with our jobs. Unfortunately, that’s not the reality in far too many organizations, and has even formed a large part of the current reproducibility crisis in many scientific research fields

The ActiveState Platform has been specifically designed to alleviate many of the problems that plague reproducibility in software organizations. By implementing it, organizations can:

  • Guarantee runtime environment build reproducibility
  • Ensure environment consistency, reducing configuration drift
  • Eliminate “works on my machine” issues and thereby increase coder productivity

Ready to see for yourself? You can try the ActiveState Platform by signing up for a free account. Or sign up for a free demo and let us show you how we can eliminate your reproducibility issues.

Need more information about reproducibility? Read these:

The State of Package Management (and How to Make It Better)

Data Sheet for Coders: Advanced Package Management

Recent Posts

Webinar - Securing Python and Open Source Ecosystems
Securing Python and Open Source Ecosystems

Dustin Ingram, Fellow at the Python Software Foundation (PSF), joins us to discuss trust and security for PyPI and other repositories in light of recent supply chain attacks, and steps being taken to secure the open source ecosystem.

Read More
Scroll to Top