Conversation
Notices
-
Embed this notice
Haelwenn /элвэн/ :triskell: (lanodan@queer.hacktivis.me)'s status on Friday, 19-Jul-2024 02:40:58 JST Haelwenn /элвэн/ :triskell: @wolf480pl Don't accept new connections :P -
Embed this notice
Haelwenn /элвэн/ :triskell: (lanodan@queer.hacktivis.me)'s status on Friday, 19-Jul-2024 02:43:23 JST Haelwenn /элвэн/ :triskell: @wolf480pl That said I wonder how you even end up in that kind of situation, except maybe with broken implementations where the logger dynamically allocates memory instead of using static buffers. -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Friday, 19-Jul-2024 03:52:22 JST Wolf480pl @lanodan you know what? I'm curious about that too.
It was haproxy writiing logs to stdout in O_NONBLOCK mode. If a write fails, haproxy drops logs. As soon as I set nbthread > 1, it started dropping logs.
Now where did that stdout go? IIRC through a pipe to containerd (or kubelet or runc or sth) that was copying those logs to files in /var/log, which were then being tailed by fluentd, which tried to parse timings and couldn't keep up.
1/
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Friday, 19-Jul-2024 03:55:53 JST Wolf480pl @lanodan but since fluentd was tailing the logs, the only effect of it not keeping up should be that a log file gets rotated before fluentd is done with it.
Unless...
Unless it kept the old logfile open which resulted in there being no space for the new logfile to grow.
But I'd thin more stuff would break in that case... and I didn't see a "node out of disk" alert.
So it must've been containerd not keeping up with copying from a pipe to a file?
-
Embed this notice
Haelwenn /элвэн/ :triskell: (lanodan@queer.hacktivis.me)'s status on Friday, 19-Jul-2024 03:55:53 JST Haelwenn /элвэн/ :triskell: @wolf480pl Log rotation? Done with the hack that is logrotate or done natively? (Like here I've setup rsyslog to have per-day files and zfs does the compression) -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Friday, 19-Jul-2024 04:16:55 JST Wolf480pl @lanodan besides, if it is containerd that can't keep up copying, that'd means I need to handle the logs within the pod, instead of on the host side...
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Friday, 19-Jul-2024 04:51:13 JST Wolf480pl @lanodan also thanks for the questions, it made me realize I jumped to conclusions too quickly, and fluentd not keeping up may not be the real problem that's causing this
Haelwenn /элвэн/ :triskell: likes this.
-
Embed this notice