Best Practices Kubernetes

8 Algolia-Tested Best Practices for Kubernetes

At Algolia, we started using Kubernetes almost two years ago. We had new services to deploy, and even if we’re big users of bare metal machines, we needed more flexibility. Therefore, we decided to test and use Kubernetes on new systems. Two years later, most of our products are deployed on Kubernetes. As more and more teams started to use it internally, we created an internal training. And today, we’re proud to make this training open source, so anyone can learn from it and contribute.

We extracted eight practices from this training that we consider to be key for using Kubernetes correctly. Let’s review them.

1. Do not use root user in your containers

The container paradigm, and the way it’s implemented on Linux, wasn’t built with security in mind. It only exists to restrict resources, such as CPU and RAM, like the documentation of Docker explains. This implies that your container shouldn’t use the “root” user to run commands. Running a program in a container is almost the same as running a program on the host itself. If you are interested in knowing more, check this article to understand why.

Thus, add those lines on all your images to make your application run with a dedicated user. Replace “appuser” with a name more relevant for you.

ARG USER=appuser # set ${USER} to be appuser
addgroup -S ${USER} && adduser -S ${USER} -G ${USER} # adds a group and a user of it
USER ${USER} # set the user of the container
WORKDIR /home/${USER} # set the workdir to be the home directory of the user

This can also be ensured at the cluster level with pod security policies.

2. Handle the “SIGTERM” signal

Kubernetes sends the “SIGTERM” signal whenever it wants to gracefully stop a container. You should listen to it and react accordingly in your application (by closing connections, save a state, etc.) In general, following the twelve-factor app recommendations for your application is considered good practice. Also, don’t forget to configure terminationGracePeriodSeconds on your pods. The default is 30 seconds, but your application might need more (or less) time to properly terminate.

3. Use a declarative management for your manifests

Use declarative manifests so you can rollback your code and infrastructure efficiently. It means that your source versioning should be the source of truth of your manifests.

It implies that you only use kubectl apply to update or create your Kubernetes resources, but also that you don’t use the latest tag for your image containers. Each version of your containers should be unique, and using Git hashes is a good practice. When deploying a new version of your application, you should update the manifest by specifying a new version for the containers, then commit the manifest in your source control, and finally run kubectl apply.

4. Lint your manifests

YAML is a tricky format. We use yamllint, because it supports multi-documents in a single file.

You can also use Kubernetes-specifics linters:

  • kube-score lints your manifests and enforce good practices.
  • kubeval also lints the manifests, but only checks validity.

In Kubernetes 1.13, the --dry-run option appeared on kubectl which lets Kubernetes check your manifests without applying them. You can use this feature to check if your YAML files are valid for Kubernetes.

5. Configure the liveness and readiness probes

Liveness and readiness are ways for an application to communicate its health to Kubernetes. Configuring both helps Kubernetes handle your pods correctly, and react accordingly to state change.

The liveness probe is here to assess whether if a container is still alive; meaning, if the container is not in a broken state, a deadlock, or anything similar. From there, it can take decisions such as restarting it.

The readiness probe is here to detect if a container is ready to accept traffic, block a rollout, influence the Pod Disruption Budget (PDB), etc. It’s particularly useful when your container is set to receive external traffic by Kubernetes (most of the time, when it’s an API).

Usually, having the same probe for readiness and liveness is acceptable. In some cases though, you might want them to be different. A good example is a container running a single-threaded application that accepts HTTP calls (like PHP). Let’s say you have an incoming request that takes a long time to process. Your application can’t receive any other request, as it’s blocked by the incoming requests; therefore it’s not “ready”. On the other hand, it’s processing a request, therefore it’s “alive”.

Another thing to keep in mind, your probes shouldn’t call dependent services of your application. This prevents cascading failure.

6. Configure resource requests and limits

Kubernetes lets you configure “requests” and “limits” of the resources for pods (CPU, RAM and disk). Configuring the “requests” helps Kubernetes schedule your pods more easily, and better pack workloads on your nodes.

Most of the time you could define "request" = "limit". But be careful, as your pod will be terminated if it goes above the limit.

Unless your applications are designed to use multiple cores, it is usually a best practice to keep the CPU request at "1" or below.

7. Specify pod anti-affinity

When you deploy an application with a lot of replicas, you most probably want them to be evenly spread across all nodes of the Kubernetes cluster. If you have all your pods running on the same node, and this node dies, this will kill all your pods. Specifying a pod anti-affinity for your deployments ensures that Kubernetes schedules your pods across all nodes.

A good practice is to specify a podAntiAffinity on the hostname of the node:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: my-application
spec:
 replicas: 2
 selector:
   matchLabels:
     app: my-application
 template:
   metadata:
     labels:
       app: my-application
   spec:
     containers:
     - name: my-pod
       image: my-image:my-version
     affinity:
       podAntiAffinity:
         preferredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
               matchExpressions:
                 - key: app
                   operator: In
                   values:
                     - app: my-deployment
             topologyKey: kubernetes.io/hostname

Here we have a deployment “my-application” with two replicas, and we specify a podAntiAffinity specification with a soft requirement (preferredDuringSchedulingIgnoredDuringExecution, see here for more details), so we don’t schedule the pods on the same hostname (topologyKey: kubernetes.io/hostname).

8. Specify a Pod Disruption Budget (PDB)

In Kubernetes, pods have a limited lifespan and can be terminated at any time. This phenomenon is called a “disruption”.

Disruptions can either be voluntary or involuntary. Involuntary disruptions means, as its name suggests, that it wasn’t something anyone could expect (a hardware failure for example). Voluntary disruptions are initiated by someone or something, like the upgrade of a node, a new deployment, etc.

Defining a “Pod Disruption Budget” helps Kubernetes manage your pods when a voluntary disruption happens. Kubernetes will try to ensure that enough that match a given selector are remains available at the same time. Specifying a PDB improves the availability of your services.

Conclusion

We recommend you adjust those practices based on the specifics of your applications and workload. Yet, we think these are fine defaults, and we apply them on all our apps in Kubernetes.

You can find more details on these good practices on the dedicated section of the training.