RAleksejuk•10mo ago

Monitoring Zitadel http latencies via internal prometheus metrics

In order to monitor and alert on Zitadel app/pods http latencies, we have configured 95 percentile stats on http_server_duration_milliseconds_bucket internal Prometheus metric (which Zitadel itself exposes):

(histogram_quantile(0.95, sum(rate(http_server_duration_milliseconds_bucket{container=~".*zitadel.*"}[5m])) 
by (le, pod, net_host_name))) > 2000

(histogram_quantile(0.95, sum(rate(http_server_duration_milliseconds_bucket{container=~".*zitadel.*"}[5m])) 
by (le, pod, net_host_name))) > 2000

But the problem is - from time to time the alert is being triggered (and the latency is closed to 10 seconds) without any reasoning. If we look into the same metric, but from the K8s Ingress perspective - we don't see such high latencies (the ingress which forwards the http traffic to the zitadel pods). What could be the cause of the high latencies of just http_server_duration_milliseconds_bucket metric while not so much high for the k8s ingress? Some exact known scenarios for e.g. What would be the best way to investigate and diagnose the cause of it? As of now we do have the metrics, the url endpoint, but no other details (for e.g. source IP, username used and etc).

0 Replies

No replies yetBe the first to reply to this messageJoin

Monitoring Zitadel http latencies via internal prometheus metrics

Did you find this page helpful?