@sjw have you tried zstd compression? i know it is a bit slower than lz4 but iirc since linux kernel 5.16 the speed was improved by 15% so it should be more in par with lz4
@EdBoatConnoisseur It's all media uploads so for the most part incompressible data.
I suspect a lot of this savings comes from partial blocks which LZ4 is more than good enough.
For database backups I plan to use zstd. For the database itself I plan to use LZ4.
LZ4 should give me a 2-3 compression ratio with negligible CPU overhead and negligible latency. Zstd could give me 3+ but at the expense of CPU and more than double the latency.
I'll use zstd for the rest of the system tho.
I plan to use a block size of 16k for postgres (as I'm too lazy to deal with compiling my own version to use larger block sizes plus that would also cause a lot of downtime as I rebuild the databases).
The only question is what block size to use for the rest of the system. I figure the default 128k should be good. I'd like to use 1M for /var/log/ to improve compression ratios but I'm pretty sure I'd take a decent IO performance hit by doing that since as far as I know they're written one line at a time.
@EdBoatConnoisseur I'm using ZFS. It's about the only filesystem you can completely trust to actually keep data integrity (and if all else fails it'll tell you your file isn't the way it's supposed to be).
Also, it has support for RAID. Btrfs RAID 5/6 still isn't stable. ZFS has RAID 5, 6, and even 7!
ZFS has over a decade of proven stability under its belt and it has always prioritised data integrity.
Btrfs literally had a bug in a "stable" release just a few years ago that'd corrupt your filesystem and make data retrieval virtually impossible.
Don't get me wrong, I like a lot of the features of Btrfs. I use it on my personal desktop. I just don't trust it for production use. The stability just isn't there.
As for deduplication, hopefully reflinks will eventually make their way into ZFS. For now you're kind if stuck with online deduplication. At least offline file deduplication is still possible via hash match + hardlinks.
For fedi, I'd argue that's more than good enough.
Actually, I might go back to $uuid/$filename because I prefer raw filename to serving $sha256.$extension?name=$filename and hoping the client respects the Content-Disposition headers (Husky does not).
I can easily use something like fdupes or jdupes to deduplicate via sha256 and hardlinks with a wheely cronjob.
Actually, I think I might set that up over the weekend.
Anyway, yeah, ZFS for me. It wasn't until I actually learned how to ZFS and actually started using it that I understood what all the hype was about.
@shibao and @lanodan were extremely helpful in helping me set it up and understand ZFS. You can ask me as well as I now have a decent grasp on it and can help keep you from falling into some of the same pitfalls I feel into (and still haven't gotten around to correcting).
@sjw@shibao@lanodan the one that does seem promising is bchachefs, but still a soontm deal.
onto the raids i’m not sold on raid 7, raid 4 is where i feel comfortable, then again i am of the idea that the more redundacy the better and would even do raid 1 0
would be nice if btrfs or zfs got support for asymmetric device performance the way mdadm has writemostly and writebehind. for read heavy stuff with bursts of small writes it seems to be working really well for me so far
@roboneko@EdBoatConnoisseur yeah, I know, it's arguably the most stable with hands down the highest amount of data integrity.
That really threw me off as well. I mean people and companies use ZFS when data integrity is a top priority because it's literally the only filesystem they can trust with their important data.
@sjw@EdBoatConnoisseur zfs is certainly more established in that regard but there are other options. dm-integrity but I guess that's newish. facebook is one of the companies running lots of btrfs in production
right now the real difference appears to be the stuff with large raid arrays plus all the fine tuning you can do. you can set all sorts of stuff per-dataset meanwhile the btrfs devs say they might get around to supporting stuff like per-subvolume compression options at some point in the future :02_laugh:
@roboneko@EdBoatConnoisseur well no. dm-integrity relies on the disk telling the OS there was a problem. mdadm will absolutely return bad data to you without telling you. ZFS will do everything it can to fix the data and if all else fails it'll at least tell you that the data is bad.
@sjw@EdBoatConnoisseur I ... no? I don't know where you got that idea but it's wrong. dm-integrity is an offshoot of luks to cryptographically guarantee authenticity. but you can use it standalone in a less secure mode to provide only basic checksums instead of full cryptographic authentication
bit flips will absolutely be detected by it as that's the entire point of the thing. what the higher layers do with that information is up to them
mdadm for its part will attempt to reconstruct any data for which it receives an IO error. so if you run it on top of dm-integrity it will either manage to silently reconstruct the corrupted sector at read time or else it will in turn feed an error to the filesystem driver that tried to read it
@sjw@shibao@EdBoatConnoisseur Had to check wikipedia: RAID0 → All in serial RAID1 → Mirroring, 2+ disks RAID 2-4 → Single-copy Parity RAID6 → Double-copy Parity RAID7 → Triple-copy Parity
How is using RAID $number clearer?
Specially as then it goes really messy when you need to describe nesting precisely as RAID1 is 2+ disks. Like RAID1+0 vs. RAID0+1 is horribly unclear compared to just saying (here for 6 disks): 2 mirrors, 3 (disks) each or 3 mirrors, 2 (disks) each.
And idle/hotswap disks also aren't in the picture of RAID.
@sjw@shibao@EdBoatConnoisseur Why would I learn the numbers, specially when they have no logic? ZFS doesn't uses them and I don't plan on using anything else, I've been using ZFS for longer than some teenagers on fedi have been alive.
@lanodan@roboneko@shibao@EdBoatConnoisseur so raid1c3 is you have a pool of disks but every file is stored on at least 3 disks so you can lose 2 disks without losing data. It's like RAID 6/raidz2 but no math involved and lots of wasted disk space. Btrfs has had support RAID 6 for a few years now but don't use it outside of testing because you have a good likelihood of data corruption still.
@lanodan@sjw@shibao@EdBoatConnoisseur well I dunno what else they should have called it, maybe just drop the "raid1" from the name entirely? it's mirrored data except the number of devices, device sizes, and number of copies are all relatively arbitrary. it's an interesting tradeoff in terms of rebuild workload vs redundancy vs space efficiency
I have no idea if anyone uses those modes for Serious Business tho
@sjw@EdBoatConnoisseur@lanodan@shibao actually isn't c3 equivalent to z2 in terms of device failure? but (obviously) unlike zN it isn't parity c3 is 3 full copies
and I think all of the above are examples of the issue with using numbers for names. c3 is not z2 is not raid1 and it only gets worse
Like being unable to partially mount a mirror while that's a thing I test from time to time on my machines with the system pool being mirrored. (Some initramfs builders are a bit dumb when it comes to those cases)