RAleksejuk
RAleksejuk•8mo ago

Benchmark of zitadel v2.66.0 - more details on your testing setup is needed

We have looked into your zitadel v2.66.0 benchmarking results provided at https://zitadel.com/docs/apis/benchmarks/v2.66.0/machine_jwt_profile_grant In our setup we are getting significantly worse performance and it looks like the postgresql db cluster can be a bottleneck. We are looking to make our postgresql db config similar to yours, but are lacking some details. 1. In yours "Database specification" it is specified "vCPU: 8 memory: 32Gib". Is it per 1 db cluster node, or an overall summed resources per all nodes? 2. How many write/read replicas are in your postgresql db cluster? Are you distributing zitadel sql queries between write/read somehow? (as zitadel doesn't support that, maybe some middleware query routing/loadbalancing solution is used?) 3. Are you planning to implement zitadel application level query routing to distribute the query load between all db cluster replicas, and not only utilize master/write node?
14 Replies
fabienne
fabienne•8mo ago
@adlerhurst can you give those insights?
RAleksejuk
RAleksejukOP•8mo ago
Any updates on the matter? @adlerhurst @fabienne bump 🙂
adlerhurst
adlerhurst•8mo ago
Hi there 1. Its a single postgres node not a cluster 2. A single master node handling both write and read 3. Currently we more focus on caching (current implementation uses single cluster redis) but if the need is big enough we will pick this topic up again Can you tell us a bit more about your setup? And what do you mean by significant worse performance do you have any numbers?
RAleksejuk
RAleksejukOP•8mo ago
@adlerhurst As the Zitadel doesn't have the application level capability to route write sql queries to master replica and read queries to remaining read replicas accordingly, what would be an official solution/recommendation there to utilize both master and slave (read-only) postgresql nodes and distribute the load? Have you tested the Zitadel with pgpool-II middleware (which has the capability of query routing) for e.g.? Maybe some other 3rd-party solution? Zitadel is suggesting to move from Cocroach DB (which is a multi-master db) to Postgresql db (single-master), so there should be probably some thoughts/recommendations on that, how should we achieve distribution of the queries on highly available db setup, similar to cockroach db 🤔
Unknown User
Unknown User•8mo ago
Message Not Public
Sign In & Join Server To View
RAleksejuk
RAleksejukOP•8mo ago
@Avolicious We are just performance testing now, and trying to figure out, what should the correct "Enterprise grade" Highly Available setup of Zitadel (including the HA Postgresql cluster) look like. The initial plan is to have 30K users logins per hour with the plans to growth further.
Unknown User
Unknown User•8mo ago
Message Not Public
Sign In & Join Server To View
RAleksejuk
RAleksejukOP•8mo ago
@Avolicious Thank you for pointing out, we will look into ProxySQL. Still would be good to hear official recommended way from Zitadel developers 😊 Btw, have you tried to use ProxySql together with Zitadel + existing Postgresql ha cluster in practice?
Unknown User
Unknown User•8mo ago
Message Not Public
Sign In & Join Server To View
RAleksejuk
RAleksejukOP•8mo ago
Still would like to bring the focus back and get some comments from zitadel team on official/recommended/tested enterprise-grade HA Postgresql db architecture to be used in conjunction with Zitadel - can we state, that in all yours setups (for enterprise customers in Zitadel Cloud for e.g.) you are scaling PostgreSQL database vertically and not horizontally, therefore using just master postgresql db replica in all the cases. So not using any read/write query routing middlewares solutions (like pgpool, proxysql) at all? Is this correct? 🤔 @adlerhurst
adlerhurst
adlerhurst•8mo ago
@RAleksejuk i will give a feedback as soon as i have some time
FFO
FFO•8mo ago
ATM we do not utilize pgbouncer or similar solutions since we focused on connection efficiency in out perf. benchmarks. For example, testing in our machine jwt profile grant benchmark we do run about 100 https request per second on top of 2 db connections. So with 1k rps you are "only" at about 20 connections and at 10k/rps its 200 connections Key here is a low p99 in the DB which mainly is defined by CPU / Storage IO. Our general plan is to use our caches capabilities and use that in more places before looking into read-replicas and other means. Besides this we have some plans to make our storage layer faster by a magnitude, but we will share more on this later (we keep PG, so no worry)
RAleksejuk
RAleksejukOP•8mo ago
Got the idea, thanks. But besides benchmarking, if we speak in general about the enterprise-grade zitadel setups you have for other customers, it is mainly the vertical scaling you are using (as the opposite to horizontal you had with Cockroachdb) and master/write replica is utilized only, is my understanding correct @FFO ? bump
adlerhurst
adlerhurst•8mo ago
yes we mainly use vertical scaling of a single master

Did you find this page helpful?