These days, we can’t really discuss build reproducibility without accounting for the integrity of the open source supply chain artifacts in the build. Take the source control systems, build platforms and package repositories involved in our build processes, for example. All of them are potential points of compromise where bad actors can introduce compromised commits and artifacts, and even trusted developers can unwittingly add typosquatted packages.
A reproducible build is generally defined as “builds with the same inputs result in bit-for-bit identical output.” But if even one of your supply chain artifacts has been compromised, your build is no longer reproducible, and you may not even know it.
One solution that has been proposed by Google is Supply-chain Levels for Software Artifacts (SLSA, pronounced salsa), which is “an end-to-end framework for ensuring the integrity of software artifacts throughout the software supply chain.” It imposes a number of requirements on the source, build process and provenance of supply chain artifacts. The goal (at the highest level) is to:
- Not only ensure secure, reproducible builds BUT
- Create builds that are verifiably reproducible. In other words, builds where it’s possible to determine the binary provenance of an artifact—information such as what sources it was built from—in a trustworthy manner.
When you use the ActiveState Platform to build your Python, Perl and Tcl environments, you get secure, reproducible runtime environments every time. While we can’t claim “verifiably reproducible” at the moment (we’re still in Beta), the ActiveState Platform currently provides:
- Scripted, hermetic builds that take place in ephemeral, isolated environments
- Source code provenance that is non-falsifiable and complete in terms of identifying all dependencies (including transitive dependencies), the build entity, source and other metadata
The result is a consistent, reproducible runtime environment (a version of the programming language plus all the dependencies required to run the application), which goes a long way to solving the problem of consistent environments.
Solving Environment Configuration Drift
All projects start with a standard set of open source dependencies–a standard configuration–that every developer on the team uses to build their development environment. But what quickly happens has been termed “configuration drift” in which different developers install various patches, packages or versions of packages (along with their different dependencies) to try and solve issues that arise during the coding process.
Files designed to control configuration drift, such as the ReadMe, CPANfile (for Perl), Requirements.txt (for Python), etc., sooner or later all become out of date since they are subject to manual updating, sharing and implementation. Even those organizations that standardize on containers still need to ensure that their containers are rebuilt with the latest version of the runtime environment and re-implemented to ensure consistency. When these processes are not strictly adhered to, the result can often be a proliferation of bugs.
According to a study by Stripe, developers can spend over 17 hours per week debugging, refactoring and fixing bad code, all of which adds up to lost productivity on the order of $300B per year. The majority of that time is spent reproducing environments in which bugs have been found.
The ActiveState Platform provides organizations with a central configuration management console for Python, Perl and Tcl runtime environment configuration:
- One or more developers on a team can create and update the configuration for their project.
- Configurations are automatically resolved for dependencies, including:
- Transitive dependencies (ie., dependencies of dependencies)
- Operating system-level dependencies
- Shared dependencies (such as OpenSSL)
- Dependency conflicts are not only flagged, but a manual solution is also offered
- Configurations can be branched, merged, saved and restored at any time.
As a result, organizations can:
- Create a single, central source of truth for the configuration of their runtime environment configuration.
- Use the same configuration across all environments, or else create branches for development, test, UAT, pre-production, production, etc., that programmatically inherit any changes made to the parent.
- A single command can be used to programmatically update each system to the latest configuration.
Adopting the ActiveState Platform can significantly reduce configuration drift in your organization.
Conclusions: Use the ActiveState Platform to build environments that work on ‘all’ machines
No coder wants to worry about reproducibility. Software should “just work” so we can all get on with our jobs. Unfortunately, that’s not the reality in far too many organizations, and has even formed a large part of the current reproducibility crisis in many scientific research fields.
The ActiveState Platform has been specifically designed to alleviate many of the problems that plague reproducibility in software organizations. By implementing it, organizations can:
- Guarantee runtime environment build reproducibility
- Ensure environment consistency, reducing configuration drift
- Eliminate “works on my machine” issues and thereby increase coder productivity
Need more information about reproducibility? Read these: