Intended beaviour of livenessProbe
Hi, we are using Zitadel in production on Kubernetes. We deployed it with Flux using the official helm chart.
Recently, we observed that the livenessProbe always returns "ok" even if Zitadel is not able to connect to its database (and therefore broken).
The readyness probe fails, as we would have expected. Is this behaviour intended?
Environment: self-hosted
Zitadel version: v3.3.6 (3 replicas)
Stack:
* Official helm chart, v8.13.4
* Kubernete v1.33.4
* CNPG 1.27.0 (3 replicas)
Steps to reproduce:
* Make sure that Zitadel can't connect to its database (either by changing the config or deleting the database).
* Deploy a netshoot pod (https://github.com/nicolaka/netshoot#netshoot-with-kubernetes) and execute the following command using the IPs of the Zitadel pods:
curl -ks <pod-ip>:8080/debug/healthz
* Observe that this always returns "ok"
I can understand this behaviour somehow as restarting the pod because of a failed livenessProbe would not solve the problem in this case. But I wouldn't also expect Zitadel to return "ok" on its healthz endpoint if somethings is seriously wrong.
5 Replies
Hi @lnxu303, I'm an AI assistant for ZITADEL. While a team member reviews this question, I'll see if I can find relevant sources for your question.
Answer
I wasn't able to find a direct answer, here are some relevant sources:
1. Zitadel's Deployment Architecture : Mentions that new versions signal readiness on the
/debug/ready
endpoint when ready to accept traffic, but doesn't specifically address the /debug/healthz
endpoint behavior during database connectivity issues.
2. How to make health check inside zitadel docker ? : Shows examples of health checking with /healthz
endpoint and mentions the /debug/ready
endpoint, but doesn't explain the difference in behavior between liveness and readiness probes.
Feel free to tag @Ask Inkeep with additional questions.
Got it. Tagging @Rajat for additional help.hey @lnxu303 thanks for your question and welcome to the server, Yes, this behaviour is intended.
In ZITADEL, the
/debug/healthz
endpoint is designed to report whether the process is alive, not whether it is fully functional or able to connect to the database. This means the livenessProbe will continue to return "ok" as long as the ZITADEL process itself is running, regardless of database connectivity.
you can read about it here https://zitadel.com/docs/apis/observability/healthHi, thank you for the clarification. This confirms the assumption we had after some consideration. Thanks for the quick reply, have a good day 🙂
awesome @lnxu303 you can mark my answer with ✅ and it will auto close this thread 🙂
Glad I could be helpful. Feel free to create a new thread with any new questions you may have.