ZITADELZZITADEL
Powered by
RAleksejukR
ZITADEL•2y ago•
22 replies
RAleksejuk

Zero-downtime when restarting zitadel pods is not possible as of now

Zitadel Helm chart version: 8.1.0
Zitadel image version: v2.55.0
Kubernetes: v1.29.5
DB: PostgreSQL 15.6
Ingress: nginx v1.10.1

PROBLEM: When the zitadel's pod is being restarted or stopped - kubernetes sends SIGTERM signal to stop the main container, and after that - zitadel immediately "gracefully" exits. This happens almost instantly, therefore the endpoints controller is updating the endpoints at the same time and ingress controller (before detecting the change in enpoints) could for some small amount of time still send the traffic to non-existing controller. (as the container immediately exited without any delay). And as a side effect - you will see a few "bad gateway" errors while trying to reach the zitadel endpoints at that time. So no complete ZERO-downtime is possible
HOW TO REPRODUCE: start k6 performance tests (VUs >15) and at the same time restart whole zitadel deployment (it will be done one by one pod) - you will see a few "bad gateway" errors due to the process described above.
POSSIBLE SOLUTION: Zitadel theoretically should get SIGTERM signal, and after the signal was received - to delay a bit its exiting, before the endpoints are updated. The delay could be configurable for e.g.

What do you think of that issue?
ZITADEL banner
ZITADELJoin
ZITADEL - Identity infrastructure, simplified for you.
4,374Members
Resources
Recent Announcements

Similar Threads

Was this page helpful?

Similar Threads

OIDC via Zitadel in CF Zero Trust
sebastkaSsebastka / questions-help-bugs
3mo ago
When setting up Zitadel, why is the default adminuser called `zitadel-admin@zitadel.localhost`?
SteveSSteve / questions-help-bugs
8mo ago
Experiencing downtime
DWalderDDWalder / questions-help-bugs
3y ago