Last week I chatted with Mitchell Hashimoto about his recently released open-source project Consul for service discovery, which builds upon their previous project Serf.
Many people will know Mitchell as the creator of Vagrant, but since then he has formed Hashicorp and released projects Packer, Serf and now Consul.
Mitchell said that he started Vagrant when he was in college and worked on it during his free time. He did development consultancy and found that jumping between clients meant rebuilding his machine to set up his development environment.
He started to look for easier solutions. Using virtual machines seemed to be the best option and services like Amazon Web Services made it easy for him to fire up VMs quickly. Unfortunately, there was an hourly cost to doing this on Amazon and, as a developer, he preferred to have everything he needed on his local machine.
The Vagrant project was what came out of Mitchell’s desire to make it easier for him to work with virtual machines for development on his local machine. It was not long before others were picking Vagrant and using it, and soon he was spending all his free time working on it.
With the popularity of Vagrant and the extent to which it was draining so much of Mitchell’s time, he decided he needed to form a business around it if he was going to sustain it.
With his co-founder Armon Dadgar, Mitchell formed Hashicorp.
“Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.”
Packer was a new project that complimented Vagrant. Mitchell tells me that, “My goal was to make Ops easier and more enjoyable”.
He was frustrated by building machines with slow Chef and Puppet runs and saw Packer as a way to solve that. Packer was also a way to narrow the gap between development and production.
Packer was good at building virtual machines to run applications and services, but once something is packerized it is essentially set in stone. Mitchell said that he started to see the limitations in not being able to do more dynamic things and they started to look at service discovery.
Mitchell said that his co-founder Armon was the first to start looking at gossip protocols and they thought this would be a great basis for what they wanted to build.
Cornell University had written some great papers on gossip protocols and they consumed them all.
They created Serf, which is a product they built around the SWIM gossip protocol from Cornell. They have modified SWIM slightly for increased propagation speed and convergence rate.
SWIM is very scalable. Unlike other algorithms, it scales linearly, rather than quadratically, with the number of nodes. It also consumes very little bandwidth. “We have users with 5 digit number of servers and the bandwidth usage doesn’t even register.”
If you read up on Serf, you will hear the comparison to a zombie apocalypse. This is an interesting analogy, so I asked Mitchell about it.
Serf nodes act in a similar way to how people act during a zombie apocalypse. If you see a zombie nearby you start screaming and you will immediately tell those close to you – your friends, your family. While it is impossible for everyone to tell everyone, the news spreads rapidly. Mitchell also compares this to how news quickly spreads on Twitter.
When a node joins the cluster, it only needs to tell one other node and the gossip protocol ensures that everyone is eventually aware of the new node. All this is done over UDP.
Originally Serf had a description more focused on “service discovery”. Since Consul came out, they have re-positioned Serf more appropriately as “a decentralized solution for cluster membership, failure detection, and orchestration. Lightweight and highly available.”
While they still think Serf is useful for the right applications, they found several limitations when using it for service discovery. This included not being able to fit enough metadata into UDP packets, having to manually configure DNS entries and the lack of support for health checks.
The limitations identified in Serf were addressed in Hashicorp’s next project, Consul, which also utilizes Serf’s gossip protocol.
Mitchell sums up Consul as “service discovery and configuration made easy.” He described Serf as a lego block, which is not always clear to people how to use or where it fits. “Consul is a complete building, so it is more obvious what to use it for.”
Consul uses the gossip protocol for small pieces of ephemeral information. Since the transport is UDP, it is unreliable. They use it for data that, if lost, would not be a big deal. This includes basic health state and cluster membership. It is also used to pass the addresses of nodes, so that more reliable TCP connections can be established for RPC.
A Consul cluster is comprised of agents that run on all nodes and all the nodes are included in the gossip protocol.
Consul also employs the Raft algorithm for consensus and can be used for leader election. Like Riak servers you can kill any node in the consensus cluster and the cluster will continue to function without losing data.
While all agents are involved in the Consul cluster’s gossip protocol, Raft consensus is better limited to 3, 5 or 7 nodes. This is simply configured by setting a flag when the agent starts up, to say whether it is a “server” or not.
Consensus requires quorum (n/2 + 1). While there is no real minimum on the number of nodes in the Raft cluster, an odd number is optimal and more than 7 will result in performance costs. For instance, if you have a 100 services, you need 51 of them to confirm that they have written to disk before proceeding. Even though Consul does do this in parallel, it is better to keep this number low for consistently good performance.
Consul is written in Go, so I asked Mitchell why they did not use the Go library go-raft that etcd is built upon. He told me that they did look at it, but they hit some limitations that were fundamental to its design and they did not see it being easily adaptable to meet their needs.
Consul provides a “theoretically unlimited number of keys and values” and they currently have users with 10,000s of keys who see no problem.
The size of these keys and values is artificially limited to about 256kb. This limitation was driven by one of their customers seeing if they could use it as an Amazon S3-like storage.
One of the things that first grabbed my attention was the fact that Consul supports multiple data centers out-of-the-box. There are lots of new “cloud” technologies out there, but few are tackling this issue head-on.
I asked Mitchell if this was just a nice byproduct of using the eventually consistent gossip protocol. “No, our customers demanded support to this, so this was a goal when building Consul.”
Rather than have all nodes involved in cross-datacenter gossiping, only certain nodes are involved in the “WAN” gossip network. Each datacenter then has its own local “LAN” gossip network.
Agents perform two types of health checks.
The first type of health check does a heartbeat check to determine the availability of other nodes. Each node will randomly choose another node to check every second. This randomness ensures that statistically all nodes will be checked, but keeps the network overhead of this cluster-wide checking very low.
The second type of check is done locally by the agent on its own machine. For instance, this might be HTTP checks on the local web-server or disk-full checks. This is different to other solutions, such as Nagios, which call from a remote machine. Doing it this way makes the checks very quick and does not require network traffic across the cluster.
I asked Mitchell whether this type of checking would fail to uncover issues such as my HTTP server being accessible locally, but inaccessible remotely. He said that checks should be done on the public interface and any network issues beyond the local machine should be picked up by the random heartbeat checks.
Mitchell stressed that labelling a node as “unhealthy” is expensive – especially if you get it wrong. Therefore, if a health check fails, the node doing the check will contact 3 other nodes and say “have you talked to them recently?” before the cluster decides that a node is down.
DNS and HTTP
Consul supports two interfaces for service discovery – DNS and HTTP. I asked Mitchell why they provide both and not just one.
“DNS is simple. It’s good for legacy applications because it just works and applications can use Consul’s service discovery without even being aware that Consul exists”.
HTTP adds a richer interface to Consul-aware applications. It provides more metadata on services and the ability to implement things like “watchers”.
Consul provides the concept of “sessions”. Mitchell said that this is a new thing that provides “locking” and enables walking the data stored in Consul.
All traffic is encrypted between agents.
Mitchell told that me Hashicorp works with many organizations who have been using Consul and testing its scalability limitations even prior to the public release. They have done some hard stress-testing and driven features. They were also the driver for not using go-raft for Consul, after hitting limitations with etcd at scale.
Consul is still at version 0.3, but Mitchell feels that they and their private beta customers have done a good job at stress testing it.
Is it production ready? While “scary to say”, he does think it is safe now and they have seen no major issues from the large scale usage.
Image courtesy of Hamed Parham via Flickr under Creative Commons License