Waiting: The Subtle Art That You Should Master

admin2 days ago

0 26 4 minutes read

Recently, while working on a workshop entitled Test your request for a traction on Kubernetes with the GKE and Github actions, I faced the same problem twice: Service A needs service B, but the service starts faster than service B, and the system fails. In this article, I want to describe the context of these problems and how I resolved them with the same tool.

Pending in Kubernetes

It may seem strange to wait in Kubernetes. The self-healing nature of the Kubernetes platform is one of its greatest advantages. Consider two pods: a Python application and a postgreSql database.

The application starts very quickly and impatiently tries to establish a connection to the database. Meanwhile, the database is initiated with the data provided; The connection fails. The pod is found in the Failed State.

After a while, Kubernetes requests the state of the pod of the application. Because he failed, he finishes it and starts a new pod. At this point, two things can happen: the database pod is not yet ready, and it is back to square one, or it is ready, and the application finally connects.

To speed up the process, Kubernetes offers start -up probes:

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

With the probe above, Kubernetes awaits first ten seconds before requesting the state of the pod. If he fails, he is waiting for ten seconds. Rinse and repeat 30 times before it fails permanently.

You may have noticed the HTTP /health Termination point above. Kubernetes offers two exclusive probe configuration settings: httpGet Or exec. The first is suitable for web applications, while the second is for other applications. This implies that we should know what type of container the pod contains and how to check its status, provided it can. I am not a postgreSql expert, so I looked for a state verification command. The Bitnami Helm graphic looks like the following when applied:

startupProbe:
  exec:
    command:
      - /bin/sh
      - -c
      - -e
      - exec pg_isready -U $PG_USER -h $PG_HOST -p $PG_PORT

Note that the above is a simplification, because it pleases the name of the database and an SSL certificate with pleasure.

The starter probe accelerates things in relation to the default situation if you configure it correctly. You can define a long initial delay, then shorter increments. However, the more the containers are diversified, the more difficult it becomes to configure, because you must be an expert in each of the underlying containers.

It would be advantageous to look for alternatives.

Wait4x

The alternatives are tools whose emphasis is put on waiting. A long time ago, I found the waiting script for that. The idea is simple:

./wait-for is a script designed to synchronize services such as Docker containers. It is sh And alpine compatible.

Here's how to wait for an HTTP API:

sh -c './wait-for  -- echo "The api is up! Let's use it"'

He did the work, but at the time, you had to copy the script and manually check the updates. I checked and the project now provides a regular container.

Wait4X plays the same role, but is available as a container paid and provides more services to wait: HTTP, DNS, databases and message queues. This is my current choice.

Whatever the tool you use, you can use it in an INIT container:

A POD can have several containers running applications, but it can also have one or more insider containers, which are executed before starting the application containers.

The INIT containers are regular containers, except:

The containers are still working on completion.

Each container Init must successfully end before the start of the following one.

Imagine the following Pod It depends on a postgreSql Deployment::

apiVersion: v1
kind: Pod
metadata:
  labels:
    type: app
    app: recommandations
spec:
  containers:
    - name: recommandations
      image: recommandations:latest
      envFrom:
        - configMapRef:
            name: postgres-config

The application is python and starts fairly quickly. He tries to connect to the postgreSql database. Unfortunately, the database has not finished initializing, so the connection fails and Kubernetes restarts the Pod.

We can fix it with a initContainer and a pending container:

apiVersion: v1
kind: Pod
metadata:
  labels:
    type: app
    app: recommandations
spec:
  initContainers:
    - name: wait-for-postgres
      image: atkrad/wait4x:3.1
      command:
        - wait4x
        - postgresql
        - postgres://$(DATABASE_URL)?sslmode=disable
      envFrom:
        - configMapRef:
            name: postgres-config
  containers:
    - name: recommandations
      image: recommandations:latest
      envFrom:
        - configMapRef:
            name: postgres-config

In the above configuration, the initContainer Do not stop as long as the database accepts connections. When this is the case, it ends, and the recommandations The container can start. Kubernetes does not need to terminate the Pod As in the previous configuration! It involves fewer newspapers and potentially fewer alerts.

When waiting becomes compulsory

The above is a slight improvement, but you can do without it. In other cases, the wait becomes compulsory. I recently experienced it during the preparation of the workshop mentioned above. The scenario is as follows:

The pipeline applies a manifesto on the Kubernetes side
In the next step, he performs the test
At the start of the test before reading the application, it fails.

We have to wait for the backend to be ready before testing. Use wait4x wait for it Pod To accept requests before launching the tests:

      - name: Wait until the application has started
        uses: addnab/docker-run-action@v3                                       #1
        with:
          image: atkrad/wait4x:latest
          run: wait4x http ${{ env.BASE_URL }}/health --expect-status-code 200  #2

The GitHub action allows you to execute a container. I could have downloaded the Go binary instead.
Wait for it /health The termination point returns a 200 Response code.

Conclusion

Kubernetes starter probes are a great way to avoid unnecessary restarts when you start services that depend on each other. The alternative is an external waiting tool configured in a initContainer. wait4x is a tool that can be used in other contexts. This is now part of my belt of tools.

Go further:

Originally published in a Java geek on April 20, 2025

admin2 days ago

0 26 4 minutes read