ActiveBlog

PaaS: Buildpacks
by Phil Whelan

Phil Whelan, April 22, 2014

buildpackBuildpacks have become the way to standardize the setup of your stack in a PaaS. In some areas of PaaS, buildpacks may become displaced by Docker images and Dockerfiles, but for now they are the best and most portable option for PaaS and the majority of the PaaS ecosystem is standardizing on them.

In this post I will cover what Buildpacks are, where they came from, where they are going and how you can find them.

History

Heroku gets all the credit for creating Buildpacks.

Before Buildpacks came on the scene, when Heroku first came out with their public PaaS, they offered their "Aspen" stack. This exclusively supported Ruby in a very limited way.

In Aspen, only Ruby 1.8.6 was available and only for running Ruby on Rails. Application developers had a read-only file-system to run their applications on. Gem dependencies did not need to be specified. Bundler had not yet been born and the number of gems available in the world made it feasible for Heroku to install all them in the environment where your application would run. Therefore, Rails developers did not need to specific or install any dependencies. A simple "require" statement would suffice.

It is fair to say that the Aspen stack was extremely restricted; especially by today's standards. But it had the advantage of being the first and it liberated developers from the need to provision machines and installing the full stack just to get a simple application up and running. This was the first time that developers could self-serve without worrying about anything beyond their code. For this reason it was highly successful and played a significant part in the dawn of new era of "lean" startups, who were now able to get their MVP applications up and running quickly. The only barrier to entry was having a credit card.

Next came Heroku's Bamboo stack. The number of gems in the world was growing significantly and it was becoming unfeasible to install them all on the host machines. It would have also been difficult to keep them up to date with changes in the increasingly popular Ruby ecosystem. Bundler had still not yet been invented, and so developers would specify their dependencies in a .gems file.

The limitation with Bamboo was that custom binaries could not be used by developers. They were still fairly limited in the stack they could use and the dependencies were limited to a list of Ruby gems. If you wanted to use a Ruby version other than REE 1.8.7 or MRI 1.9.2, then... well, you couldn't. But again, while this was not as good as what was to come, it was better than what had come before and Heroku continued grow in popularity.

Buildpacks arrived in Bamboo's successor, the Cedar stack, and changed everything.

What Is A Buildpack?

A Buildpack defines how to build up the stack and it is extremely flexible.

Generally, in a PaaS, the application is deployed inside some form of Linux container or isolated environment. A buildpack will check what is installed inside the container and install anything additional that is needed. If the buildpack is designed to run Ruby applications under Ruby 2.0 and only Ruby 1.9.3 is installed, then it will download the newer Ruby, compile it and install it. Alternatively, it might simply run "apt-get install" or "yum install", depending on the distribution of Linux it is on. The way dependencies are installed is decided by the author of the buildpack.

The killer feature of Buildpacks is that they are not limited Ruby and in Heroku's Cedar stack this liberated Heroku users to run any software or services they required. I can install Java, Perl, Python, Go... the list is endless. Heck, if was smart enough to write my own programming language, I could run that too - if I could get it to compile on the target PaaS via a buildpack.

Anatomy

A Buildpack consists of 3 executable scripts - bin/detect, bin/compile, bin/release. In short, when a developer pushes their application to the PaaS, they can specify a BUILDPACK_URL that defines the environment in which the application will run. The bin/detect checks compatibility with the application code, the bin/compile builds up the stack, and the bin/release sets the peripheral runtime environment, such as environment variables.

Let's take a closer look at each step...

bin/detect

The purpose of this script is to say whether the buildpack is suitable for the provided application code. If it is a Ruby buildpack, then does the application has a Gemfile or config.ru in the root directory? If it is a Java buildpack, then is there a pom.xml or some other indicator that this is Java application. A single buildpack's bin/detect can detect multiple types of stacks that it supports and output the name of the type it detects to pass to bin/compile.

It is possible for a developer to neither specify which buildpack or which language framework their application is. They can leave it up to the PaaS to determine which of its internal buildpacks is most appropriate. The way this works is that multiple buildpacks can be specified or installed as resident on the PaaS. The bin/detect scripts of each Buildpack will be tested in order and the first one to return positive will be used. The remaining buildpacks will be ignored. On some PaaSes it is also possible for administrators to specify the order in which buildpacks are evaluated against the given application code.

When a PaaS provides a set of resident buildpacks it can seem like magic to the end-user. It appears that "it just works!" with their code. They simply push their application and if it is a Python application, the Python buildpack's bin/detect will return true and so the Python buildpack will be used. Likewise, if it is a Java application the bin/detect of the Java buildpack with return true and that buildpack will be used.

PaaS administrators can extend the language and frameworks supported by their PaaS as they see fit by adding additional open-source or internally created buildpacks.

There are edge-cases in which it is possible that a buildpack might mis-diagnose its compatibility with an application. If, for some reason, you have a file named Gemfile in the root of your Java application and the Ruby buildpack's bin/detect is run first, then it is the Ruby buildpack that will be used to set up the stack of this Java application. This is completely dependent on the buildpack and there are many Ruby buildpacks that use different criteria for detecting compatibility with application code. PaaS administrators, PaaS vendors and the PaaS community should be aware of the quality of buildpacks they employ, especially in the strictness or looseness of the bin/detect scripts they contain.

bin/compile

The bin/compile script of a buildpack is where the important stuff happens. It installs everything you need to get up and running with your application.

There is no limitation to what or how this script installs the dependencies. Often this will be a simple bash script, but this is not always so. For instance, in the Cloud Foundry Java Buildpack this is a Ruby script that bootstraps to load in a whole ream of Ruby functionality to setup the stack.

Not only does the bin/compile script set up your stack, but it can also compile or augment your code appropriately. For a Go, C++ or Java application this may involve compiling source code. For a Ruby on Rails application, which does not require compilation of Ruby code, it may involve rendering and minifying CSS and JavaScript assets.

bin/release

The bin/release script makes no changes to disk. It only changes the environment variables or command-line parameters. For instance, the Cloud Foundry Java Buildpack will add something equivalent to "-XX:OnOutOfMemoryError bin/killjava.sh" during the release stage.

Optimizations

One of the downsides of buildpacks is that it is a lot of work to set up a stack from scratch each time you deploy your application. This includes the time and bandwidth it takes to download assets, such as the Java runtime. It is also a lot of work to fully compile the entire code-base for applications if only small parts of it have changed between deployments.

Heroku defined that a cache directory would be available and persist across deployments. This is something to stash the assets you will want to use for each deployment. For example, if your buildpack is downloading the Java runtime, then it should save it in the cache directory, so that on the next deployment it is available locally.

Generally, it is recommended that you pre-compile dependencies outside of the buildpack and make them available on the web. Unfortunately this approach will either limit portability of the buildpack or it will add additional conditional logic to account for variations in the platform the dependencies are pre-compiled for. A buildpack maintainer will have to decide the level of portability they require for the buildpack.

Portability

The portability of buildpacks varies from buildpack to buildpack. There are many buildpacks designed to work with Heroku. Some of these can work with Cloud Foundry and Stackato without any modification, but others need tweaks.

A common work-flow is to fork the git repository of a buildpack, then change what is needed. Periodically, upstream changes are merged into the forked repository.

While not all buildpacks can be guaranteed to be portable, it is usually easy to modify them to work elsewhere.

Flexibility

I would argue that the lack of constraints is the biggest downside of buildpacks. Other than the three basic constructs of the buildpack (detect, compile, release) there is no limitation on how a buildpack is created and what it can do. This means one buildpack can be wildly different to another.

It is not simply the case of whether one buildpack uses asset caching or not, or installs certain components that the other does not. There also needs to be a gauge of quality and maintainability that comes with an open-source buildpack.

Anybody can write and publish an open-source buildpack. They can use any language to build the buildpack, which can run any code and can install any software it wants. It can target any platform, though this is usually Linux based. As yet, there are no standards around buildpack creation.

An organization investing IT resources and making a specific buildpack into a central resource needs to be confident that they can roll up their sleeves and bend it to their will. For this reason, I think it is ideal if a buildpack is written in the same language that it builds the stack for. For instance, a Perl buildpack would ideally be written in Perl. This increases the chances the organization will be comfortable with it. Otherwise, they should use a language that they are well versed in. For instance, many operations team are now somewhat familiar with Ruby due to the success of Chef. If they deploy Java applications, then the Cloud Foundry Java Buildpack, which is written in Ruby, may be well suited to them.

Curation

With so many buildpacks, and forks of buildpacks, popping up all across Github (predominately) and elsewhere, how does your average developer find the right buildpack for their application?

Heroku provides a list of default buildpacks.

Stackato has a list of built-in buildpacks that are available on GitHub, as well a list of buildpack known to work with Stackato.

For Cloud Foundry, there is page on the cf-docs-contrib wiki that catalogs the buildpacks that work with Cloud Foundry. The most recent update to this list is from CloudCredo, who recently announced that they were contributing their JBoss buildpack to the Cloud Foundry project.

I think there is much more we can do for curating and cataloguing buildpacks. Often the best gauge of quality with open-source software is checking the size of community around it. GitHub gives you metrics to see this information. Better still if an open-source project can offer distinct versions and releases to distinguish the bleeding-edge from the stable and known-to-work. The heroku-buildpack-ruby build does tag versions, as does the Cloud Foundry java-buildpack. For example, Ben Hale of Pivotal recently announced v2.1 of the Java Buildpack on the Cloud Foundry vcap-dev mailing list.

Without standards around how buildpacks are created the task of curation becomes futile when viewed across the broader PaaS landscape. It is simply an on-going process of trial-and-error to verify whether a buildpack will work in each PaaS environment.

Stackato

ActiveState strives to ensure that Cloud Foundry compatible buildpacks will also work on Stackato.

Stackato was actually the first Cloud Foundry-based PaaS to support Heroku buildpacks, back in mid-2012. Since then, the Cloud Foundry ecosystem has standardized on buildpacks and Heroku buildpacks still form the basis for most Cloud Foundry buildpacks.

Stackato 3.2 ships with a long list of buildpacks, including Node.js, Ruby, Go, Java, Python, Clojure, Scala and Play.

We also provide a "legacy" buildpack which supports all the runtimes and frameworks of Stackato 2.10 for backwards compatibility. It was nice that this could be extracted from the Stackato 2.10 code-base and encapsulated as a buildpack with a clean separation from 3.2 code-base.

Open Source

Heroku designed buildpacks with a clean separation from Heroku's PaaS. Buildpacks were an entity that could be open-sourced and this enabled their customers to extend and create their own. I believe this design decision was the key reason behind the success of buildpacks.

Similar to the ground-swell we see with Docker now, buildpacks were quickly adopted and created to support all manner of languages and frameworks in a very short time.

This large resource of open-source buildpacks was what spurred Stackato and Cloud Foundry to adopt the standard.

You can learn more about Buildpacks in the recent ActiveState webinar by Technology Evangelist John Wetherill and Cloud Engineer Ho Ming Li.

Title image courtesy of Philippe Teuwen on Flickr under Creative Commons License.

Subscribe to ActiveState Blogs by Email

Share this post:

Category: stackato
About the Author: RSS

Phil Whelan has been a software developer at ActiveState since early 2012 and has been involved in many layers of the Stackato product, from the JavaScript-based web console right through to the Cloud Controller API. Phil has been the lead developer on kato, the command-line tool for administering Stackato. Phil's current role is a Technology Evangelist focused on Stackato. You will see Phil regularly on ActiveState's Blog. Prior to coming to ActiveState, Phil worked in London for BBC, helping build the iPlayer, and Cloudera in San Francisco, support Hadoop and HBase. He also spent time in Japan, where he worked for Livedoor.com and met his wife. Phil enjoys working with big data and has built several large-scale data processing applications including real-time search engines, log indexing and a global IP reputation network. You can find Phil on Twitter at @philwhln, where you can ask him any questions about Stackato. Alternatively, email at philw at activestate.com