Implementing graceful shutdown for docker containers in go (part 2)

In part 1 I demonstrated how to implement graceful shutdown of http containers in kubernetes, however there were some issues that arose where requests could cause errors. After a bit of discussion on stack overflow I have updated my code to include an extra http listener to service pre-stop and readiness checks from kubernetes.

The way this works is as such:

  • A container needs to be shutdown
  • A pre-stop call is made
    • We take this opportunity to update our readiness to false, so that we are removed from the proxy
    • As this operation is synchronous you can optionally sleep here for a few seconds, to give kubernetes time to check and update based on your readiness (useful if scaling dramatically)
  • A SIGTERM is received
    • As before we wrap up all current requests and shut down
    • We have 10 seconds to do this before we get a SIGKILL
  • Our container is shut down

Testing

If you would like to test this on your cluster I have a sample project uploaded onto github which follows my model for running minimal go apps in a kubernetes cluster.

You can follow the setup on that page, or tweak the files to point at a different docker registry, then just run make to get a docker image.

Following on from that you need to test in kubernetes, for which you have 2 options:

  • Scaling
    • This is useful to simulate growing and shrinking your service, but can occasionally have issues when doing large scaling operations in a single command (i.e. from 20 containers to 1)
    • Scaling up and down incrementally would be advised for any services (i.e. from 20 to 19, then 19 to 18, etc)
  • Rolling Updates
    • This is the setup I have opted for, swapping out one container for a new one over a period of time
    • The files included always use the latest docker image so actually update from the old latest, to the new latest
      • This isn’t meant to be how production deploys work, you should use versions, but its good for development

Given you have a running kubernetes cluster you can bring up the service as such:

./cluster/kubectl.sh create -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/01-httpgraceful-controller.json
./cluster/kubectl.sh create -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/01-httpgraceful-service.json

Obviously substitute paths accordingly for yourself.

This will give you a hello world style output on port 31000 of any of your node ip’s. You can then modify the backend cluster either via scaling:

./cluster/kubectl.sh scale --replicas=10 rc httpgraceful-controller-1

Or via a rolling update

./cluster/kubectl.sh rollingupdate httpgraceful-controller-1 -f ~/Go/src/github.com/chriswhitcombe/httpgraceful/02-httpgraceful-controller.json

Throughout this you should always get a request routed to an active cluster and see no errors on the client side. I have noticed occasionally I do see a spike in latency which needs a little investigation however we are talking of a handful of requests in 100k only during a rollout of upgrades, which I plan to retest when not running in vagrant but on more production like hardware.

My stats during a rolling update showed:

  • Replicas: 10
  • Requests: 197,357
  • Average Request: 1039 ms (I have a 1000 ms sleep in the code)
  • Min: 1000 ms
  • Max 57,477 ms (this is to be investigated)
  • Standard Deviation: 1364 ms
  • Errors: 0%

I plan to retest this on a more production like system with far more replicas (nearer to 1000), where I expect to see less jitter in timings. After the rollout was complete the timings settled down to a standard deviation around 1040 ms.

Leave a Reply