Josh Wolfe in backend, Kubernetes

Stepping on the "Scale" with Kubernetes

When the OkCupid app on your phone updates to a new version, you have to close and relaunch the app to complete the update. This happens once every few weeks, and when it's time to update, you get to choose when to restart the app. However, the backend software for OkCupid updates multiple times per day, and the updates don't wait for you to be ready.

Imagine if everyone in the world had to stop using the OkCupid app for 5 minutes while the backend software updated to a new version. That would be unacceptable. Instead, we need the backend software to "just work all the time". This is called high availability.

While perfect 100% uptime is not realistic to achieve, we can get pretty close by addressing several common causes of downtime. Two causes are planned downtime for maintenance and unplanned downtime due to hardware failure. For both causes, the solution ends up being mostly the same:

For software to be highly available, there needs to be multiple redundant instances of the software running, and if you need to communicate with one of them, there needs to be some way of picking which one of the instances to talk to. If you have those two pieces, then you've got the tools you need to solve high availability.

Example Service Architecture

OkCupid's backend is comprised of multiple services that serve incoming requests and then make requests out to other services. Here's an ASCII-art diagram of what this might look like:

API
 |                                      ______
 +--> message-service ---------------> /      \
 |     |                               |      |
 |     +--> notification-service       |  DB  |
 |                                     |      |
 +--> doubletake-service               |      |
 |     |                               |      |
 |     +--> vote-service ------------> |      |
 |                                     |      |
 +--> profile-edit-service ----------> \______/

When a request comes in via the public-facing API, it's routed to either the message-service, doubletake-service, or profile-edit-service, and each of those services might communicate with another service or the database, and so on.

Of course, our actual backend is much more complex than this, on the order of 100 different services rather than just the 5 pictured above. The services are long-running server processes deployed on 100 or so machines, and we need up to 100 instances of some services.

It's worth pointing out that the backend isn't just one big piece of software called backend.exe; it's broken up into multiple processes that communicate with each other over TCP sockets. When we want to deploy a new version of the backend, we only need to do maintenance on the services that are actually changing, for example just notification-service.

The Naive Approach

Let's state our goals so we can choose a solution:

  • We need to be able to do maintenance on our software with no disruption of service, for example deploying a new version.
  • We need to be able to endure some amount of hardware failure without a significant disruption. Let's use an example of one machine losing power suddenly.

The simple and effective approach is to run multiple instances of each of our services spread across multiple machines. It might look something like this:

       host001
=======================
2x message-service
1x notification-service
4x doubletake-service
3x vote-service
1x profile-edit-service

       host002
=======================
2x message-service
1x notification-service
4x doubletake-service
3x vote-service
1x profile-edit-service

       host003
=======================
2x message-service
1x notification-service
4x doubletake-service
3x vote-service
1x profile-edit-service

In case of hardware failure, if host002 shuts off suddenly we've still got plenty of instances of all of our services running on other machines. And then for our maintenance use case, we just need to restart one instance of our service at a time, and there will always be some other instances up at all times to serve traffic.

Seems simple enough, but there's one big detail that we're not addressing yet. When you want to talk to a service, how do you pick which instance to talk to? You'd better not pick an instance that's down for any reason, or the service will appear unavailable. We need a way to know at all times which instances are up.

OkCupid has a system called Membership that does a decent job solving this problem. It's a registry of instances of each service that is running that includes information about where the instance is running. When an instance starts up it registers itself, and when an instance shuts down for maintenance, it deregisters itself. In the event of unexpected shutdown, the instance's registration times out after a minute or so. When you want to contact an instance of a service, you fetch the complete registry and pick one of the instances that's known to be up.

This system has worked pretty well for several years.

There's one more implementation detail we should include here, which is how the services are actually running on the machines. We use daemontools to start services and restart them if they crash. This too has worked pretty well for several years.

The Next Level of Goals

After serving traffic with high availably for several years, you get more requirements:

  • We need to add more machines, and we need to scale up the number of services to be able to handle more throughput.
  • We need to do be able to do maintenance on the machines themselves such as upgrading the operating system, which means an hour or so of nothing running on the machine.

One of the inconvenient implementation details of Membership is that the number of instances of each service has to be pre-configured, and changing the configuration is tricky, so it's challenging to scale.

As we add machines and scale up services, we can run into an uneven balance of instances running on different machines. Perhaps we need to add 20 new instances of doubletake-service and add two new machines. The obvious thing would be to add 10 instances of doubletake-service onto each of the two new machines. Then we add more new machines and scale up message-service at the same time, and so on. You eventually end up with a very uneven distribution of services.

When you want to do maintenance on a machine, you're going to need to stop all the instances running there, but those instances might be necessary for performance. There might be unused compute resources on other machines, so you could move all the service instances off of the machine onto other available machines. Doing that requires reconfiguring the daemontools run files, and then carefully bringing down the old instances and then bringing up the new ones. Remember that the number of instances is pre-configured in Membership, so you can't have the new ones and the old ones running at the same time.

This juggling act is doable, but it's error prone. Daemontools was not designed for rebalancing instances across different machines. As OkCupid continues to grow, we're needing to do these rebalances more and more often. It's time to look for better handling of this use case.

How Kubernetes Achieves Our Goals

Rather than focusing on solving specific problems, like how to automatically move daemontools run files between machines, let's step back and enumerate our actual goals. They do not include any mention of daemontools, because that's an implementation detail, not an objective.

We need to do the following without disrupting service:

  • deploy new versions of software services,
  • rescale the number of instances of software services,
  • add and remove machines,
  • tolerate single-machine failure,
  • and provide a way for software to connect with a working instance of a given service.

We really don't care what machines our services are running on; we just need somewhere for them to run. Wherever that ends up being, other software needs to know how to contact them.

Kubernetes was designed with these goals in mind.

Kubernetes requires your software to be deployed in containers, usually Docker containers, which makes system dependency problems go away. I could write a whole article about updating glibc versions and why containers are so much nicer to deal with, but let's get back to talking about Kubernetes.

A service like doubletake-service is deployed to Kubernetes with a config file that looks like this:

kind: Service
metadata:
  name: doubletake-service
spec:
  ports:
  - port: 80
  type: LoadBalancer
---
kind: Deployment
metadata:
  name: doubletake-service
spec:
  replicas: 10
  template:
    spec:
      containers:
      - image: doubletake-service:v123
        name: doubletake-service
        ports:
        - containerPort: 80

I've greatly simplified this example, so don't try this config at home, kids.

The image: doubletake-service:v123 is how you specify the executable code for your service. In this example doubletake-service:v123 would identify a Docker image version 123. We've also declared that this service listens on port 80, and that we want there to be 10 instances, or replicas, of the service running.

Kubernetes takes this specification and makes it a reality. The hosts in the cluster are utilized in a reasonably balanced way to run the instances. If more hosts are added to the Kubernetes cluster, they become available to handle instances without the author of the doubletake-service configuration even needing to be aware of it. If a host is removed from the cluster for maintenance, then any instances running on that host are automatically transferred to other hosts. If a machine dies, Kubernetes, that is constantly monitoring hosts, will detect the failure and automatically transfer any instances to other hosts.

If we want to scale services up or down, just change that number replicas: 10 to replicas: 50 or anything else, and Kubernetes will start up the instances for you.

If we want to deploy a new version of the service, just change image: doubletake-service:v123 to say :v124 instead, and Kubernetes will start up instances of the new version and shut down instances of the old version. This operation is more complex than scaling instances up or down, and there are many more configurable options for controlling it.

The final requirement is the ability for other software to connect to a working instance of our service. This is where the first block of configuration above comes in, the Service of type LoadBalancer.

However many replicas of our service we have, Kubernetes knows where they are, because Kubernetes put them there. Kubernetes maintains a DNS entry accessible as the name doubletake-service that resolves to one of the working service instances. Clients don't need to fetch and manage a registry of instance locations; instead they just connect to doubletake-service.

What's Next

Although we're far from abandoning our old systems, it looks promising that Kubernetes, though not perfect, takes much of the toil and risk away from scaling and upgrading our backend software.