craigzour
craigzourโ€ข4mo ago

Notifier errors since upgraded to 3.0.4

Hello! I am looking for help to understand and debug an issue I have with my Zitadel service. I recently upgraded my self-hosted Zitadel instance from 2.63.4 to 3.0.4 and since then I am getting recurring errors related to some Notifier resource. Every 30 minutes I see the two following errors logs:
level=ERROR msg="Notifier: Error from notification wait" err="unexpected EOF"
level=ERROR msg="Notifier: Error running listener (will attempt reconnect after backoff)" attempt=180 err="unexpected EOF" sleep_duration=8m30.106954557s
level=ERROR msg="Notifier: Error from notification wait" err="unexpected EOF"
level=ERROR msg="Notifier: Error running listener (will attempt reconnect after backoff)" attempt=180 err="unexpected EOF" sleep_duration=8m30.106954557s
(small note: it feels like the retry backoff delay is not working properly since the error happens every 30 minutes) I tried multiple small configuration adjustments but none of them resolved it. Unfortunately, I also could not find anything on the internet that would point me in the right direction when it comes to fixing it. Thank you in advance ๐Ÿ™‚
11 Replies
craigzour
craigzourOPโ€ข4mo ago
(Bump)
Rajat Singh
Rajat Singhโ€ข4mo ago
hey @craigzour thanks for the bump, looking into it
Rajat
Rajatโ€ข4mo ago
hey @craigzour The unexpected EOF error indicates that the Notifier component is experiencing an abrupt termination of its connection to the notification service or message broker. This could be due to multiple reasons, such as network interruptions, misconfigurations. I will take it to my team and ket them have a look also, what exactly you tried when you say "I tried multiple small configuration adjustments but none of them resolved it"
craigzour
craigzourOPโ€ข4mo ago
Hello @Rajat . Thank you for spending some time looking into this. I tried to make some small Zitadel configuration adjustments such as: - Tweaking the database connection options (MaxOpenConns and MaxConnLifetime) - Settings up the Notifications options with either LegacyEnabled set to true or false (because I thought it was related to the Notifier word included in the error message) Also just to make sure we are on the same page, we have not touched anything else outside of that Zitadel upgrade.
Rajat
Rajatโ€ข4mo ago
thanks for the update @craigzour I will send this to my team internally and will get back to you.
adlerhurst
adlerhurstโ€ข4mo ago
Hi there 30 minutes is the default max connection lifetime to the database. Can you increase or decrease that value so that we figure out if thats the reason. Ah sorry i skipped that message, changing the conn max lifetime didnโ€™t change the interval of the log?
craigzour
craigzourOPโ€ข4mo ago
Hello! This is correct. Even though we changed MaxConnLifetime: "30m" to MaxConnLifetime: "1h" the Notifier errors cadence stayed the same (every 30 mins or so) Here is our current config file in case it helps debugging (we reverted that maxconnlifetime change since we still had the issue)
# Default config is merged with the overrides in this file.
# https://zitadel.com/docs/self-hosting/manage/configure#runtime-configuration-file

Log:
Level: "info"

Metrics:
Type: none
Tracing:
Type: none
Profiler:
Type: none
Telemetry:
Enabled: false

Port: 8080
ExternalPort: 443
ExternalSecure: false
TLS:
Enabled: false

Database:
postgres:
Port: 5432
User:
SSL:
Mode: "require"
Admin:
SSL:
Mode: "require"
MaxOpenConns: 20
MaxIdleConns: 10
MaxConnLifetime: "30m"
MaxConnIdleTime: "5m"

DefaultInstance:
LoginPolicy:
AllowRegister: false
ForceMFA: true
HidePasswordReset: true
OIDCSettings:
AccessTokenLifetime: "0.5h"
IdTokenLifetime: "0.5h"
RefreshTokenIdleExpiration: "720h"
RefreshTokenExpiration: "2160h"

OIDC:
DefaultAccessTokenLifetime: "0.5h"
DefaultIdTokenLifetime: "0.5h"
DefaultRefreshTokenIdleExpiration: "720h"
DefaultRefreshTokenExpiration: "2160h"

Notifications:
# Notifications can be processed by either a sequential mode (legacy) or a new parallel mode.
# The parallel mode is currently only recommended for Postgres databases.
# If legacy mode is enabled, the worker config below is ignored.
LegacyEnabled: false
# The amount of workers processing the notification request events.
# If set to 0, no notification request events will be handled. This can be useful when running in
# multi binary / pod setup and allowing only certain executables to process the events.
Workers: 1
# The maximum duration a job can do it's work before it is considered as failed.
TransactionDuration: 10s
# Automatically cancel the notification after the amount of failed attempts
MaxAttempts: 3
# Automatically cancel the notification if it cannot be handled within a specific time
MaxTtl: 5m
# Default config is merged with the overrides in this file.
# https://zitadel.com/docs/self-hosting/manage/configure#runtime-configuration-file

Log:
Level: "info"

Metrics:
Type: none
Tracing:
Type: none
Profiler:
Type: none
Telemetry:
Enabled: false

Port: 8080
ExternalPort: 443
ExternalSecure: false
TLS:
Enabled: false

Database:
postgres:
Port: 5432
User:
SSL:
Mode: "require"
Admin:
SSL:
Mode: "require"
MaxOpenConns: 20
MaxIdleConns: 10
MaxConnLifetime: "30m"
MaxConnIdleTime: "5m"

DefaultInstance:
LoginPolicy:
AllowRegister: false
ForceMFA: true
HidePasswordReset: true
OIDCSettings:
AccessTokenLifetime: "0.5h"
IdTokenLifetime: "0.5h"
RefreshTokenIdleExpiration: "720h"
RefreshTokenExpiration: "2160h"

OIDC:
DefaultAccessTokenLifetime: "0.5h"
DefaultIdTokenLifetime: "0.5h"
DefaultRefreshTokenIdleExpiration: "720h"
DefaultRefreshTokenExpiration: "2160h"

Notifications:
# Notifications can be processed by either a sequential mode (legacy) or a new parallel mode.
# The parallel mode is currently only recommended for Postgres databases.
# If legacy mode is enabled, the worker config below is ignored.
LegacyEnabled: false
# The amount of workers processing the notification request events.
# If set to 0, no notification request events will be handled. This can be useful when running in
# multi binary / pod setup and allowing only certain executables to process the events.
Workers: 1
# The maximum duration a job can do it's work before it is considered as failed.
TransactionDuration: 10s
# Automatically cancel the notification after the amount of failed attempts
MaxAttempts: 3
# Automatically cancel the notification if it cannot be handled within a specific time
MaxTtl: 5m
adlerhurst
adlerhurstโ€ข3mo ago
Thanks for the info we try to reproduce
craigzour
craigzourOPโ€ข3mo ago
Thanks you for the update ๐Ÿ™‚
adlerhurst
adlerhurstโ€ข3mo ago
hi there I just created this issue for tracking: https://github.com/zitadel/zitadel/issues/10092
GitHub
Error running listener ยท Issue #10092 ยท zitadel/zitadel
From this discord thread: https://discord.com/channels/927474939156643850/1374467931672543262 They see the following error logs every 30 minutes: level=ERROR msg="Notifier: Error from notifica...
craigzour
craigzourOPโ€ข2w ago
Hello! Perfect! Thank you ๐Ÿ™‚ Hello! Wanted to share some new information about this as I spent some time investigating on it recently. I tried to find what code was throwing that recurring error and found out it was not in Zitadel directly but in a package named Riverqueue which has been integrated in Zitadel 6 months ago. It was not part of the Zitadel version we were using before (2.63.4) the migration to 3.0.4. After investigating and testing various things I discovered that our AWS RDS Proxy has an Idle timeout set to 30 minutes which is exactly the frequency at which we get those errors. I tried to increase that value to 8 hours (maximum allowed) and noticed that we were only getting one error every 8 hours. In the AWS RDS Proxy logs I also found out that we were getting that log just before getting the Notifier error
2025-08-22T12:47:27.276Z [WARN] [proxyEndpoint=default] [clientConnection=369107045] The client session was pinned to the database connection [dbConnection=3076737155] for the remainder of the session. The proxy can't reuse this connection until the session ends. Reason: SQL changed session settings that the proxy doesn't track. Consider moving session configuration to the proxy's initialization query. Digest: "set search_path to $1; set application_name to $2".
2025-08-22T12:47:27.276Z [WARN] [proxyEndpoint=default] [clientConnection=369107045] The client session was pinned to the database connection [dbConnection=3076737155] for the remainder of the session. The proxy can't reuse this connection until the session ends. Reason: SQL changed session settings that the proxy doesn't track. Consider moving session configuration to the proxy's initialization query. Digest: "set search_path to $1; set application_name to $2".
So all that to say that I think the issue is between Riverqueue and our AWS RDS Proxy. I believe Riverqueue needs an open connection with the database that never ends but our Proxy terminates it because it does not see any activity. I don't know much about Riverqueue but I am guessing that removing the AWS RDS Proxy would probably solve my problem. Since you folks probably know more about it, please let me know if you have a solution in terms of configuration that would allow me to keep that Proxy ๐Ÿ™‚ Maybe there is a way for Riverqueue to keep that connection alive or go around that AWS RDS pinning thing. FYI: I did remove the AWS RDS Proxy and have not seen that issue come back.

Did you find this page helpful?