It's not as if there aren't implementations of the majority of common compression formats in other safer languages (notably in pure Python). Oh sure, slower than C, but realistically the storage is going to be the bottleneck anwyay so it doesn't matter.
@iska@lispi314 tbh even with an NVMe SSD, Python can handle compressing and decompressing zips quite quickly unless it's a particularly funky alg like lzma (but how many use lzma anyway? most zips I've seen are just a basic deflate)
@lispi314@udongein.xyz notably in pure pythonInterpreted languages are intolerable for data in tens of gigabytes. You could've at least recommended Common Lisp
@Reiddragon@iska@lispi314 tarball of tarballs yeah. i think rpm were cpio archives in one version, and xar archives in a failed attempt to update the format.
xar was kind of interesting its like an xml file of the manifest and then a huge tarball no man's land where you can do whatever and the xml file just points you to the offset+length in the buffer. apple used it for a time.
@icedquinn@iska@lispi314 oh and that too, package managers all use either raw tars or something based on tar (I believe .deb is actually 2 tars glued together)
@Reiddragon@iska@lispi314 something of an aside, but there's not much reason to use LZMA in new projects over ZSTD. and it looks like zstd might even be overengineered https://github.com/richox/orz compared to rolz encoding
@Reiddragon@iska@lispi314 most of them are just re-tunings of LZ. lzma does fine with lots of small files but you get the best packaging with long range dictionaries (ex. "solid" compression where the whole archive is encoded.)
zstd added a feature to make and use your own dictionaries, so you *could* get a kind of blend of solid mode while retaining random access (though the dictionary has to be intact)
@icedquinn@iska@lispi314 I believe LZMA still performs better for size in some situations, but it's not particularly common and it def starts crying with the kinds of compression you need for packages (many small files)
@lispi314@udongein.xyz@iska@catposter.club the storage ends up as the bottleneckWhat a load of crap. I just SSHd into my server to benchmark. Zipping the linux source takes 39 seconds, making a tar takes 11, bottlenecked by verbose output to mirror zip (1.6 sec without)
Only decompression is close, at 13.5 seconds vs 10. Interpreters will be one or two orders of magnitude slower, so go figure.
Even if you make the argument of availability, you can use a jit-compiled javascript program, since every computer has at least 3 versions of chrome installed.
@iska Literally why? Even if you implemented the decoder in Bash it would still be the storage that ends up as the bottleneck unless you're rich and buying enterprise SSDs for mass storage (ridiculous).
A few of Common Lisp implementations are essentially interpreted, too.
Python is notable in this case because it's "common-enough it's basically litter lying discarded on the ground".
Pure CL implementations of common compressed archive formats actually seem to be lagging behind.
@mooncorebunny@iska@Reiddragon@lispi314 i looked at the code for zstd and it looked big. but there is also an ietf spec for the algorithm if i'm not mistaken. it might be very possible to reimplement (facebook wanted it to be a spec)
the reference implementation does a lot of things that you may never need, like some of the parallel / syncable encodings.
in the end its all LZ with a couple bits of tuning. like orz is RoLZ in to Huffman. RoLZ in to rANS (arithmetic encoder) is probably good too, since ANS approximates huffman.
@icedquinn@iska@Reiddragon@lispi314 I hold the view that ZStd is too complicated to count as a real "standard"; the zstd library is an order of magnitude larger than libdeflate, so the algorithm is very difficult to independently re-implement.
Therefore I'm very happy to learn about ROLZ! I may want to try re-implementing that.
I wonder if retrofitting ROLZ in place of LZ in DEFLATE would produce even better results... and, to speed things up, replace the one or two Huffman stages with FSE, which is after all the primary innovation enabling ZStd.