Implementing graceful shutdown for docker containers in go (part 2)

In part 1 I demonstrated how to implement graceful shutdown of http containers in kubernetes, however there were some issues that arose where requests could cause errors. After a bit of discussion on stack overflow I have updated my code to include an extra http listener to service pre-stop and readiness checks from kubernetes.

The way this works is as such:

  • A container needs to be shutdown
  • A pre-stop call is made
    • We take this opportunity to update our readiness to false, so that we are removed from the proxy
    • As this operation is synchronous you can optionally sleep here for a few seconds, to give kubernetes time to check and update based on your readiness (useful if scaling dramatically)
  • A SIGTERM is received
    • As before we wrap up all current requests and shut down
    • We have 10 seconds to do this before we get a SIGKILL
  • Our container is shut down

Testing

If you would like to test this on your cluster I have a sample project uploaded onto github which follows my model for running minimal go apps in a kubernetes cluster.

You can follow the setup on that page, or tweak the files to point at a different docker registry, then just run make to get a docker image.

Following on from that you need to test in kubernetes, for which you have 2 options:

  • Scaling
    • This is useful to simulate growing and shrinking your service, but can occasionally have issues when doing large scaling operations in a single command (i.e. from 20 containers to 1)
    • Scaling up and down incrementally would be advised for any services (i.e. from 20 to 19, then 19 to 18, etc)
  • Rolling Updates
    • This is the setup I have opted for, swapping out one container for a new one over a period of time
    • The files included always use the latest docker image so actually update from the old latest, to the new latest
      • This isn’t meant to be how production deploys work, you should use versions, but its good for development

Given you have a running kubernetes cluster you can bring up the service as such:

./cluster/kubectl.sh create -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/01-httpgraceful-controller.json
./cluster/kubectl.sh create -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/01-httpgraceful-service.json

Obviously substitute paths accordingly for yourself.

This will give you a hello world style output on port 31000 of any of your node ip’s. You can then modify the backend cluster either via scaling:

./cluster/kubectl.sh scale --replicas=10 rc httpgraceful-controller-1

Or via a rolling update

./cluster/kubectl.sh rollingupdate httpgraceful-controller-1 -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/02-httpgraceful-controller.json

Throughout this you should always get a request routed to an active cluster and see no errors on the client side. I have noticed occasionally I do see a spike in latency which needs a little investigation however we are talking of a handful of requests in 100k only during a rollout of upgrades, which I plan to retest when not running in vagrant but on more production like hardware.

My stats during a rolling update showed:

  • Replicas: 10
  • Requests: 197,357
  • Average Request: 1039 ms (I have a 1000 ms sleep in the code)
  • Min: 1000 ms
  • Max 57,477 ms (this is to be investigated)
  • Standard Deviation: 1364 ms
  • Errors: 0%

I plan to retest this on a more production like system with far more replicas (nearer to 1000), where I expect to see less jitter in timings. After the rollout was complete the timings settled down to a standard deviation around 1040 ms.

One thought to “Implementing graceful shutdown for docker containers in go (part 2)”

  1. Chris,

    I don’t see the need to do anything with the readiness probe. I will illustrate this in the following steps.

    Let’s take the scenario of a rolling kubernetes upgrade.

    1. To perform rolling upgrades, kubernetes will bring up new pods and kill the old ones. To kill the old one, Kubernetes sends SIGTERM.
    2. Once SIGTERM is sent, kubernetes removes the pod from the router. At this point it’s irrelevant what health checks the pod responds to because kubernetes has already started the process to shutdown and kill this pod.
    3. Pod either exits gracefully (within the configurable timeout configuration setting) or eventually gets killed with a SIGKILL message and forced to close.

    This makes a shutdown process quite simple in that once an application receives a SIGTERM it only needs to wait until current requests finish, then cleanup any db connections and finally kill idle (keep alive) connections.

Leave a Reply