@badnetmask nah, we'd accidentally had pgbouncer configured for 2000-2500 connections, but the process had LimitNOFILESoft of 1024, so we were running out of sockets and accept() calls were failing when mastodon tried to connect to the database.
Issue only surfaced 2 days ago but we hadn't changed server configuration in 20 days, just upgraded system packages on non-database nodes in our infrastructure, so we thought we'd broken a dll linked library in ruby like libvips or libicu
we thought we'd fixed it yesterday, but had the wrong root cause. The error message was obtuse and the promtail log lines dropped the stack trace lines that followed, so we only had partial information & thought the error must've been from the upgrade, when in reality it was just a freak coincidence.
Finally figured it out earlier & deployed a fix. But it took a few tries to get it right.
We also now have alerts in place if pgbouncer errors again (we're in the process of rebuilding the primary database server & switching to pgcat which has much greater observerability)
The other option was that it was a networking error, but we could access the database node via the tailnet that connects our infrastructure.