Cloud SQL High CPU Usage

We are using CloudSQL with postgres 15, and everyday at 12:00 pm, we have a 100% CPU consumption as we can see in the first image. We saw that the queries that are consuming the most from our database are those in the second image, but this query: "select owner, created_at, "sequence", position from eventstore.push($1::eventstore.command[])" were called 15 thousand times, this query overload our database, this is happening in our production environment everyday, so we need some help ASAP We have a 4 vCPU and 16 GB RAM of CPU and a few more than 20 organizations and 31K of users
No description
No description
5 Replies
FFO
FFO4mo ago
Hi there, that is a weird case. What version of zitadel are you currently on?
Arnau
Arnau2mo ago
We are experiencing similar behavior with 1 Org and 35k users. Zitadel 3.3.0 and RDS Serverless Aurora. AWS RDS performance insights report the same queries as above. @Gabriel Brandão any insights/tips you could provide from your side?
FFO
FFO4w ago
Can you share your zitadel config?
Arnau
Arnau4w ago
The HelmRelease would look like this, secret only contains RDS endpoint and credentials.
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: zitadel
spec:
interval: 1m
chart:
spec:
chart: zitadel
version: 8.13.4
sourceRef:
kind: HelmRepository
name: zitadel-charts
valuesFrom:
- kind: Secret
name: zitadel-values
valuesKey: values.yml
values:
image:
tag: v3.3.0
replicaCount: 2
initJob:
enabled: false
metrics:
enabled: true
serviceMonitor:
enabled: false
zitadel:
configmapConfig:
Log:
Level: info
Formatter: json
LogStore:
Access:
Stdout:
Enabled: true
ExternalSecure: true
ExternalDomain: zitadel.company.com
ExternalPort: 443
TLS:
Enabled: false
Database:
Postgres:
Port: 5432
Database: postgres
MaxOpenConns: 20
MaxIdleConns: 10
MaxConnLifetime: 30m
MaxConnIdleTime: 5m
User:
SSL:
Mode: disable
Admin:
SSL:
Mode: disable
ingress:
enabled: false
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: zitadel
spec:
interval: 1m
chart:
spec:
chart: zitadel
version: 8.13.4
sourceRef:
kind: HelmRepository
name: zitadel-charts
valuesFrom:
- kind: Secret
name: zitadel-values
valuesKey: values.yml
values:
image:
tag: v3.3.0
replicaCount: 2
initJob:
enabled: false
metrics:
enabled: true
serviceMonitor:
enabled: false
zitadel:
configmapConfig:
Log:
Level: info
Formatter: json
LogStore:
Access:
Stdout:
Enabled: true
ExternalSecure: true
ExternalDomain: zitadel.company.com
ExternalPort: 443
TLS:
Enabled: false
Database:
Postgres:
Port: 5432
Database: postgres
MaxOpenConns: 20
MaxIdleConns: 10
MaxConnLifetime: 30m
MaxConnIdleTime: 5m
User:
SSL:
Mode: disable
Admin:
SSL:
Mode: disable
ingress:
enabled: false
We switched to a db.r6g.large RDS since I posted the message and it improved. With Serverless we had 2 ACUs. We are reaching ~70% of CPU usage in RDS though during periods where we perform lots of queries to search users using REST POST /v2/users.
FFO
FFO4w ago
Hm, could be that have something that is not well cached/optimized here. The r6g.large is a 2CPU machine, right? Mainly asking because having too many SQL connections open can plague small CPU machines Can you check some metrics on RDS for me, especially query latency would be interesting

Did you find this page helpful?