Notifier errors since upgraded to 3.0.4
Hello!
I am looking for help to understand and debug an issue I have with my Zitadel service.
I recently upgraded my self-hosted Zitadel instance from 2.63.4 to 3.0.4 and since then I am getting recurring errors related to some Notifier resource.
Every 30 minutes I see the two following errors logs:
(small note: it feels like the retry backoff delay is not working properly since the error happens every 30 minutes)
I tried multiple small configuration adjustments but none of them resolved it. Unfortunately, I also could not find anything on the internet that would point me in the right direction when it comes to fixing it.
Thank you in advance ๐
11 Replies
(Bump)
hey @craigzour thanks for the bump, looking into it
hey @craigzour The
unexpected EOF
error indicates that the Notifier component is experiencing an abrupt termination of its connection to the notification service or message broker. This could be due to multiple reasons, such as network interruptions, misconfigurations.
I will take it to my team and ket them have a look
also, what exactly you tried when you say "I tried multiple small configuration adjustments but none of them resolved it"Hello @Rajat .
Thank you for spending some time looking into this. I tried to make some small Zitadel configuration adjustments such as:
- Tweaking the database connection options (MaxOpenConns and MaxConnLifetime)
- Settings up the Notifications options with either LegacyEnabled set to true or false (because I thought it was related to the Notifier word included in the error message)
Also just to make sure we are on the same page, we have not touched anything else outside of that Zitadel upgrade.
thanks for the update @craigzour I will send this to my team internally and will get back to you.
Hi there
30 minutes is the default max connection lifetime to the database. Can you increase or decrease that value so that we figure out if thats the reason.
Ah sorry i skipped that message, changing the conn max lifetime didnโt change the interval of the log?
Hello!
This is correct. Even though we changed
MaxConnLifetime: "30m"
to MaxConnLifetime: "1h"
the Notifier errors cadence stayed the same (every 30 mins or so)
Here is our current config file in case it helps debugging (we reverted that maxconnlifetime change since we still had the issue)
Thanks for the info we try to reproduce
Thanks you for the update ๐
hi there
I just created this issue for tracking: https://github.com/zitadel/zitadel/issues/10092
GitHub
Error running listener ยท Issue #10092 ยท zitadel/zitadel
From this discord thread: https://discord.com/channels/927474939156643850/1374467931672543262 They see the following error logs every 30 minutes: level=ERROR msg="Notifier: Error from notifica...
Hello! Perfect! Thank you ๐
Hello!
Wanted to share some new information about this as I spent some time investigating on it recently.
I tried to find what code was throwing that recurring error and found out it was not in Zitadel directly but in a package named Riverqueue which has been integrated in Zitadel 6 months ago. It was not part of the Zitadel version we were using before (2.63.4) the migration to 3.0.4.
After investigating and testing various things I discovered that our AWS RDS Proxy has an Idle timeout set to 30 minutes which is exactly the frequency at which we get those errors. I tried to increase that value to 8 hours (maximum allowed) and noticed that we were only getting one error every 8 hours.
In the AWS RDS Proxy logs I also found out that we were getting that log just before getting the Notifier error
So all that to say that I think the issue is between Riverqueue and our AWS RDS Proxy. I believe Riverqueue needs an open connection with the database that never ends but our Proxy terminates it because it does not see any activity.
I don't know much about Riverqueue but I am guessing that removing the AWS RDS Proxy would probably solve my problem.
Since you folks probably know more about it, please let me know if you have a solution in terms of configuration that would allow me to keep that Proxy ๐
Maybe there is a way for Riverqueue to keep that connection alive or go around that AWS RDS pinning thing.
FYI: I did remove the AWS RDS Proxy and have not seen that issue come back.