March 11, 2021build.pl, cpan, makefile.pl, Perl ecosystem, perl modules, perl versioning

Everything you ever wanted to know about the Perl ecosystem

The Perl ecosystem has been around for more than 25 years, and is continuing to evolve. Whether you’re new to Perl or returning to it, understanding the Perl ecosystem is key to learning how the Perl programming language works.

This blog post covers:

  • PAUSE & CPAN – how Perl modules are uploaded to CPAN
  • cpanminus & CPAN  – tools for installing Perl modules
  • Distros vs Modules vs Packages – is everything really just a module?
  • Libraries – and how to load them
  • Naming & Versioning – conventions and exceptions
  • Makefile.PL / Build.PL – the entry points to module installation
  • Dependency Declaration – how to manage dependencies
  • Dual-Life Modules – packages shipped in both the core and their own distribution
  • Unauthorized Uploads – and how to deal with them
  • Dependency-Only Distributions – doc-only distros and recommendation lists
  • Other Tools and Services – including rt.cpan.org and CPAN testers

PAUSE and CPAN

The core of the Perl ecosystem for distributing libraries consists of two services, PAUSE and CPAN.

PAUSE, the [Perl Programming] Authors Upload Server is the system which processes uploads of new packages. It also contains all the permissions for individual packages.

CPAN, the Comprehensive Perl Archive Network is the system that makes these available for others to download and install. CPAN uses PAUSE’s permissions data to determine if a given release contains unauthorized packages.

Nowadays, most systems that want data from CPAN use MetaCPAN, which provides both an API (built on ElasticSearch) and a search engine for CPAN, along with a web view of all CPAN data, including distribution releases, packages (including code and docs), authors, etc.

CPAN (the package), CPANPLUS, and cpanminus

The original command line tool for installing CPAN packages was called “CPAN” and invoked from the command line as cpan. This client is still distributed with the Perl core.

Later, a new client was created called CPANPLUS. This client is no longer developed very actively, and the Perl community does not encourage its use. It was distributed with the Perl core starting with the 5.10.0 release and removed starting with 5.20.0.

Finally, the newest and now most widely recommended tool is called cpanminus. This is a simple, no configuration tool that is very easy to use for those new to Perl.

Distributions, Modules, and Packages

Unlike many other languages, Perl distinguishes between the things that are distributed via CPAN, called distributions, and the things that a Perl program or library can depend on. The latter thing is most correctly referred to as a package, but you will also see people call these modules. And to make things more confusing, people also often call a distribution a module too. Everything’s a module.

A distribution (often shortened to distro) is a named, versioned collection of code uploaded to CPAN. Typically, a distribution will contain one or more modules. These will be uploaded as tarballs or zip files. Some examples include:

  • DateTime-1.54
  • libwww-perl-6.52
  • Type-Tiny-1.012001

A distribution can also contain executable scripts. In fact, a distro can contain only scripts with no explicitly declared packages, but an upload without any packages at all is effectively invisible, so to upload a tool to CPAN authors will typically put most of the code in a package and invoke that package from a script. See Perl-Tidy for an example, where the perltidy executable simply exists to invoke the Perl::Tidy package.

It’s also possible for a distribution to contain only documentation files (ending in .pod). Typically, authors will define at least one module in the distribution as well, otherwise it will also be invisible to the PAUSE indexing system, just like a script-only distribution.

A module is, technically, a single file with a .pm extension. But as noted previous, people in the Perl community will generally use the term “module” to refer to distributions, modules, and packages.

A package is a single namespace in Perl code. These are declared as package Foo::Bar; in code. These packages can be declared in any Perl code, including scripts (files ending in .pl), modules (files ending in .pm), or even in code run via the command line (perl -e 'package Foo::Bar; say "hello"').

Package names are usually constructed of one or more camel-cased words joined by a double-colon (::) separator, so for example DateTime or Type::Tiny. Some packages are lower cased, like strict or local::lib. Lower case names are reserved by convention for pragmas, which are packages that contain code which changes how Perl compiles the code using them. For example, strict enables stricter compiler checks and local::lib changes how external code is loaded.

A module typically contains exactly one package, but it can contain more than one or even none.

Loading External Code

In Perl code, when you want to load external code, you provide package names like use Foo::Bar. But under the hood, the perl executable will translate this into a module’s file name, Foo/Bar.pm, and look for it in one of the directories it is configured to use for libraries.

This works as expected for the vast majority of cases, but there are CPAN distros which contain modules that have multiple packages in them. This can be a problem if you want to use a package that does not correspond to the module file name that contains it. So for example, if you want Foo::Bar::Child, which lives in Foo/Bar.pm, you would have to use Foo::Bar to load it. Directories that Perl expects to find modules in are stored in an interpreter global variable named @INC. The default list of directories is compiled into the perl executable itself, but this can be augmented or replaced by setting the PERL5LIB environment variable, or by using the standard libraries [lib] (https://metacpan.org/pod/lib) and blib.

In addition, there are also distributions on CPAN that people use for this purpose, the most popular of which is local::lib, which allows you to create local (usually per-user) sets of Perl library directories.

Installing Distributions

Typically, developers will install a distribution by asking for one of its modules. For example, by running cpanm LWP::UserAgent. The tool figures out what distribution to download. The cpanm tool doesn’t support installing via the distro name, for example cpanm libwww-perl, but the CPAN.pm tool does.

Naming and Versioning

Most distributions are named after their “primary module”. This is the module that provides the primary entry point to the distribution’s documentation. It may also contain the main entry points for code, but in some cases there is no one “main” entry point for the code in the distribution.

However, there is no requirement that the distro contain a module matching the distro name. There are some older distros that are still in common use that do not do this. The most notable of these is the libwww-perl distro. Its main module in terms of documentation is called LWP, while it has several main entry points for code, notably LWP::UserAgent and LWP::Simple.

Similarly, there is no requirement that there be a correspondence between the distro’s version and the versions of the package(s) it contains. While most modern distros do make sure the distro and modules share a version, many older releases do not. For example, libwww-perl-5.836 contains:

  • HTTP::Cookies 5.833
  • HTTP::Response 5.836
  • LWP 5.836
  • LWP::UserAgent 5.835

… and so on.

Perl Core Versioning

The Perl core uses a three part major.minor.patch scheme. However, the major version of Perl has been 5 since the 5.0000 release in October of 1994. This means that Perl has effectively treated the minor part of the version as major, and the patch as minor. Specifically, backwards incompatible changes are only allowed when the minor version is incremented.

In addition, the Perl core convention is that odd minor release numbers, such as 5.13.x or 5.31.x, are unstable development releases, and even numbers, such as 5.14.x or 5.32.x, are stable releases.

Distribution and Package Versioning

There are no constraints on what a distribution version can look like. So you could upload a distro like libwww-perl-fancy-next-version.tar.gz and that would be processed by PAUSE. But the individual packages in a distribution must contain versions that conform to the types of versions defined in the version package.

There are two types of Perl versions, decimal versions and dotted decimal versions. Decimal versions are simply numbers greater than or equal to zero. These can be written as integers (1) or decimal numbers (1.0). Such versions are always compared numerically, so 1 == 1.0 == 1.0000.

The dotted decimal versions are similar to semver numbers in that they have three parts. They are typically written with a leading v, so you see something like v1.0.0 or v2.3.5. These versions can be converted to decimal versions and vice versa. The algorithm for the conversion is that the first element of the dotted decimal becomes the whole number potion of the version, and the next two numbers are zero-padded to three digits and concatenated for the fractional part of the number.

Here are some example conversions:

  • 1.0.0 -> 1.000000
  • 2.3.5 -> 2.003005
  • 2.30.50 -> 2.030050
  • 2.300.500 -> 2.300500

While dotted decimal versions should have three parts, the Perl tooling will accept any number of parts greater than three as well, though this is fairly uncommon on CPAN.

  • 2.3.5.8 -> 2.003005008
  • 1.2.3.4.5.6 -> 1.002003004005006

There have been some issues related to three(+) part versions in the ecosystem in the past, so for simplicity some distributions have adopted their own pseudo-three part scheme. One of these is Moose, which started doing this with version 2.0000. The convention for Moose is that the whole number is the major version (2), and the four digits in the fractional part can be split into minor and patch releases. So for example, 2.0103 is major version 2, minor version 1, patch version 3, and 2.1015 is minor version 10, patch version 15.

Moose follows the same convention as the Perl core, where an even-numbered minor release is stable and an odd-numbered one is a development release.

Other distributions may follow similar ad-hoc schemes.

Marking Trial/Unstable Releases

PAUSE supports a convention for marking a release as a trial or unstable release. The old way to do this was to put an underscore in the distribution’s version number, like libwww-perl-5.53_97. The Perl ecosystem supports using the same version in packages, since underscores in numbers are simply ignored by Perl.

The modern way to do this is to append -TRIAL to the distribution’s version, like DateTime-1.37-TRIAL.

In either case, the CPAN clients will not install these versions by default. Instead, if you simply ask for DateTime, you will get the most recent stable release.

This mechanism allows CPAN authors to publish potentially breaking changes in a way that lets others opt into testing them, without risking breaking downstream dependents. In addition, these releases will receive some testing from the CPAN Testers network. See below for more detail on that service.

Distribution Metadata

Starting back in 2003, there was a push to have distributions include metadata about their contents. As a result, most distros on CPAN will include either a META.yml (old format) or META.json file (new format) describing the distro’s contents. In particular, this metadata should include a provides key that details what package versions are provided by the distro, along with the modules that each package is in.

This metadata also includes information about the author, dependencies, licensing, and much more.

Packages Can Move or Go Away

There is nothing in the Perl ecosystem that requires a package to stay in the same distribution over time. A new version of a distribution can contain new packages and remove old ones. It’s also possible for distributions to be split into multiple distros, or for multiple distros to be combined into one.

For example, libwww-perl used to include many packages related to HTTP and HTTP clients, including HTML::Form, HTTP::Request, HTTP::Message, and more. But when libwww-perl-6.00 was released, most of these were split out into their own distros, including HTTP-Message and several others.

Conversely, when Moose-2.0000 was released, it included the entirety of the formerly separate Class-MOP distribution.

Makefile.PL and Build.PL Scripts

Every distro ships with either a Makefile.PL or Build.PL script (or both). These are the entry points used to actually install the distribution. Running a Makefile.PL generates a Makefile, and then running make install installs the distro. With Build.PL, running it generates a Build perl script. Running ./Build install installs the distro.

In the common case, these scripts are quite simple, but in some edge cases they are quite complex. Before the static metadata standards push that led to the META.yml and META.json file, the distro’s metadata was only available from this file.

Even today, it is expected that these files will include all of the distro’s relevant metadata, including its dependencies. That said, tooling that wants this metadata is strongly encouraged to use the static META.* files whenever possible, as this is much simpler and safer.

Some of these Makefile.PL or Build.PL scripts do some additional configuration-time logic, similar to what you might see in an autoconf-generated configure script. Some of the things they might do include:

These scripts may also check custom environment variables or command line arguments to control some of these features. For example, many modules with an optional XS component or dependency will accept a --pureperl_only flag.

Dependency Declaration

Because of the fact that individual packages can move between distros or go away entirely, Perl developers are strongly encouraged to declare dependencies based on the exact packages they load in their code, instead of picking one package from a distribution and depending on it. So for example, if you have code which uses both HTTP::Request and LWP::UserAgent, you should declare a dependency on both of those packages at the version you want.

The tools in the Perl ecosystem only support “greater than or equal” dependency declarations. The most common representation of the dependency data is as a dictionary (aka map, hash, etc.) of package name to version. Some tools may allow you to omit the version entirely, in which case this is treated as >= 0.

Some parts of the ecosystem, like the cpanfile spec, allow you to specify more complex requirements with multiple operators, for example >= 1.0, < 2.0. However, nothing in the ecosystem actually enforces those more specific requirements. Only the >= portion will actually be respected.

Dual-Life Modules

This is yet another use of the word “module” where we mean package.

A “dual-life” module is a package or set of packages that is shipped both in the Perl core and as its own distribution on CPAN. There are many of these, including very common dependencies like ExtUtils::MakeMaker.

Sometimes the version in the Perl core does not exist on CPAN. This may happen when a bug needs to be fixed for a specific Perl release, but the corresponding fix is not safe for older versions of Perl. So the version shipped with the new Perl core is bumped, but later a safer fix is uploaded to CPAN under a new, higher version number. It may also happen if the bug fix is urgent for the Perl core and the maintainer of the CPAN version is not able to participate in the process of fixing the bug. In that case the bug fix released to CPAN may be different from what was done in the Perl core.

Unauthorized Distributions

The first person to upload a package to PAUSE takes ownership of that package. They can share that ownership or give it away to others if they choose. However, nothing prevents someone else from uploading a distribution with a package owned by someone else. For example, someone uploaded libwww-perl 5.837 without permission to any of its packages. In this case, PAUSE processes the upload and it is visible on MetaCPAN, but none of the CPAN clients will recognize this release. Attempting to install LWP::UserAgent will never install this unauthorized release, but you could install it explicitly with the cpanm client by passing it a URL to the release tarball, cpanm https://cpan.metacpan.org/authors/id/O/OL/OLEG/libwww-perl-5.837.tar.gz. You could also download the tarball and install it manually.

It’s possible for a distribution to contain a mix of authorized and unauthorized package releases. This typically happens when a distribution is released by a new author but the former author failed to give all of the needed permissions to the new author. Typically, the authors will notice, work out the permissions issue, and PAUSE can be told to re-index the upload, which will fix the package authorization status.

Dependency-Only Distributions

There is a history of uploading distributions without code that are simply some documentation and a set of dependencies. This is a way to provide a set of recommended packages for others to use. These distribution names either start with Bundle- (old naming) or Task- (new naming), like Task-Kensho. The old Bundle system relied on some magic behaviour hard-coded into CPAN clients, whereas the Task system uses the standard CPAN dependency declaration mechanisms.

Other Tools and Services

There are a number of other tools and services in the Perl ecosystem.

rt.cpan.org

The rt.cpan.org site is a customized version of Request Tracker (RT), a ticket tracker written in Perl. Every distribution uploaded to CPAN automatically has an associated ticket queue on rt.cpan.org.

However, it is possible to set a distribution’s metadata to point to an alternative bug tracker, and many distributions use GitHub or other issue trackers.

CPAN Testers

The CPAN Testers system includes a service for processing and browsing test results, along with clients that report test results to this service. This testing consists of running a distribution’s test suite on a client machine. The client machine can be running on any platform that supports Perl, so it’s common to see a wide variety of OS and architectures in the test reports.

This testing can only be done after a release, and there’s no guarantee that any particular platform will be tested. In addition, there’s no time limit for reporting results.

Sometimes an author will explicitly make a trial release for the purposes of getting test results for a new version of a distro.

ActiveState’s New Perl Ecosystem

The ActiveState Platform introduces some advantages over standard Perl, including:

A Unified Toolchain. If your teams develop and deploy on multiple operating systems, your organization has to manage multiple toolchains in order to build, update and manage your Perl environments. In contrast, the ActiveState Platform delivers a consistent package management experience across Windows and Linux (Mac coming soon) so you only need one toolchain to support all your teams.

Virtual Environment Support. While Perl does have third party support for virtual environments, different systems are used on different operating systems. The ActiveState Platform automatically creates a virtual environment for each of your projects, isolating dependencies and preventing conflicts.

Support for More Complex Requirements. As previously stated, while you can specify requirements like >= 1.0, < 2.0 in a cpanfile, the Perl ecosystem won’t enforce these requirements. The ActiveState Platform, however, will enforce >, =, <. >=, ==, <=, etc., in any combination you can think of.

Find out more about ActiveState’s Perl or just install it into a virtual environment and give it a try:

For Windows:

powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.activestate.com/dl/cli/install.ps1'))) -activate-default ActiveState/Perl-5.32"

For Linux:

sh <(curl -q https://platform.activestate.com/dl/cli/install.sh) --activate-default ActiveState/Perl-5.32

Suggested Reads For Someone New To Perl

  1. Learn about 10 Perl tools that can be useful in every developer’s toolbox. Download our “Pearls of Perl” build to get a version of Perl 5.28 with the tools listed in this post so you can test them out for yourself.
  2. Here’s a countdown of the top ten most common tasks that Perl is used for. You can (arguably) accomplish these more easily with Perl even today.

Related Reads For Perl Programmers:

Advanced Package Management for Perl Projects

ActiveState’s New Perl Ecosystem

Dave Rolsky

Dave Rolsky

Dave Rolsky began his development career with Perl in 1999, and has created or contributed to dozens of Perl CPAN modules, including DateTime, Log::Dispatch, Moose, and more. More recently, he has also developed in Rust and Go. Way back when, he co-wrote Embedding Perl in HTML with Mason and RT Essentials, both published by O'Reilly. In his free time, he enjoys tasty vegan food, reading, video games, and rock climbing.