Conversation
Notices
-
Embed this notice
feld (feld@friedcheese.us)'s status on Monday, 02-Sep-2024 06:49:23 JST feld @cult All of this sounds like a Linux problem not a ZFS problem. My experience is completely different having exclusively used ZFS on Solaris and FreeBSD -
Embed this notice
CULTPONY :verified: :verified: (cult@pony.social)'s status on Monday, 02-Sep-2024 06:49:30 JST CULTPONY :verified: :verified: Also this is the annual reminder that last time anyone did a survey on how good filesystems are at reporting up write errors, ZFS only qualified on reporting some errors and only common ones. Btrfs and ext4 both mostly swallowed write errors.
Part of that is infra, in the modern async "issue write and OK it to the process before the device OKs it" world, the FS can't reliable report such things.
That's where we got the postgresql "Fsync considered unreliable" from, a bug that persists on linux and can cause data loss on any DB setting other than "Fsync every write or do O_DIRECT"
-
Embed this notice
CULTPONY :verified: :verified: (cult@pony.social)'s status on Monday, 02-Sep-2024 06:49:34 JST CULTPONY :verified: :verified: I wiped one of my older systems for reasons, I do love watching nwipe while it's deleting it's own root FS. See some services begin to fail as they are unable to continue.
Root FS was ZFS, it managed 11 minutes before an error was reported after I issued blkdiscard to the entire disk and then started nwipe on it. After that it crashed on a kernel panic within a minute.
On the one hand, understandable. But also for a filesystem touting it's safety and stability, I don't think it should kernel panic that easily.
BUt that's honestly part of my experience, I've sysadmin'ed ZFS for 5 years, it's only stable for common failure modes. If a controller breaks or disks do fun stuff like "return all zeroes and discard writes" then ZFS will crash your computer just as badly as the other filesystems will.
Soapboxing a tiny bit, we should write modern filesystems in a way that we assume that a malicious actor is gonna be messing with our ability to IO with it. That also includes assuming "the device is discarding writes and returning zeroes without error". ZFS is great if you limit yourself to common disk failures (ie, where errors are reported or disconnects). If the controller is faulty or the disk behaves in non-error ways, good chance ZFS will trash the pool.
ext4 and btrfs mostly differ in that they take longer to notice things wrong or the corruption is more extensive without notice. ZFS just crashes faster.
-
Embed this notice