Conversation

Notices

Embed this notice
LisPi (lispi314@udongein.xyz)'s status on Friday, 30-Aug-2024 19:46:40 JST LisPi

Seeing discussions worrying about maliciouss zip files is just sad.

Why do users even tolerate that?

It's not as if there aren't implementations of the majority of common compression formats in other safer languages (notably in pure Python). Oh sure, slower than C, but realistically the storage is going to be the bottleneck anwyay so it doesn't matter.

In conversation about 3 months ago from udongein.xyz permalink
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 19:46:32 JST iced depresso
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @iska @Reiddragon @lispi314 i know zip allows it because i read the spec lately, but package mangers do not use zip files :blobcatgoogly:
  
  In conversation about 3 months ago permalink
- Embed this notice
  Iska (iska@catposter.club)'s status on Friday, 30-Aug-2024 19:46:33 JST Iska
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @Reiddragon@fedi.reimu.info @iska@catposter.club @lispi314@udongein.xyz how mant use lzma anyway?Package managers
  
  In conversation about 3 months ago permalink
- Embed this notice
  Reid :ablobcatattention: (reiddragon@fedi.reimu.info)'s status on Friday, 30-Aug-2024 19:46:37 JST Reid :ablobcatattention:
  in reply to
  - Iska
  @iska @lispi314 tbh even with an NVMe SSD, Python can handle compressing and decompressing zips quite quickly unless it's a particularly funky alg like lzma (but how many use lzma anyway? most zips I've seen are just a basic deflate)
  
  In conversation about 3 months ago permalink
- Embed this notice
  Iska (iska@catposter.club)'s status on Friday, 30-Aug-2024 19:46:38 JST Iska
  in reply to
  
  @lispi314@udongein.xyz notably in pure pythonInterpreted languages are intolerable for data in tens of gigabytes. You could've at least recommended Common Lisp
  
  In conversation about 3 months ago permalink
  
  翠星石 likes this.
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 19:49:33 JST iced depresso
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @Reiddragon @iska @lispi314 tarball of tarballs yeah. i think rpm were cpio archives in one version, and xar archives in a failed attempt to update the format.
  
  xar was kind of interesting its like an xml file of the manifest and then a huge tarball no man's land where you can do whatever and the xml file just points you to the offset+length in the buffer. apple used it for a time.
  
  In conversation about 3 months ago permalink
- Embed this notice
  Reid :ablobcatattention: (reiddragon@fedi.reimu.info)'s status on Friday, 30-Aug-2024 19:49:35 JST Reid :ablobcatattention:
  in reply to
  - iced depresso
  - Iska
  @icedquinn @iska @lispi314 oh and that too, package managers all use either raw tars or something based on tar (I believe .deb is actually 2 tars glued together)
  
  In conversation about 3 months ago permalink
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 19:52:22 JST iced depresso
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @Reiddragon @iska @lispi314 something of an aside, but there's not much reason to use LZMA in new projects over ZSTD. and it looks like zstd might even be overengineered https://github.com/richox/orz compared to rolz encoding
  In conversation about 3 months ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: opengraph.githubassets.com
    
    GitHub - richox/orz: a high performance, general purpose data compressor written in the crab-lang
    
    a high performance, general purpose data compressor written in the crab-lang - richox/orz
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 20:09:45 JST iced depresso
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @Reiddragon @iska @lispi314 most of them are just re-tunings of LZ. lzma does fine with lots of small files but you get the best packaging with long range dictionaries (ex. "solid" compression where the whole archive is encoded.)
  
  zstd added a feature to make and use your own dictionaries, so you *could* get a kind of blend of solid mode while retaining random access (though the dictionary has to be intact)
  
  In conversation about 3 months ago permalink
- Embed this notice
  Reid :ablobcatattention: (reiddragon@fedi.reimu.info)'s status on Friday, 30-Aug-2024 20:09:47 JST Reid :ablobcatattention:
  in reply to
  - iced depresso
  - Iska
  @icedquinn @iska @lispi314 I believe LZMA still performs better for size in some situations, but it's not particularly common and it def starts crying with the kinds of compression you need for packages (many small files)
  
  as for orz: ew, Rust, get that filth away from me
  
  In conversation about 3 months ago permalink
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 20:38:43 JST iced depresso
  in reply to
  - Iska
  - Reid :ablobcatattention:
  @iska @Reiddragon @lispi314 yes, xz is still very common. i was just noting that anyone who can should be emigrating to zstd.
  
  rolz looks like it may be competitive to zstd, as well as simpler to code, though.
  
  In conversation about 3 months ago permalink
- Embed this notice
  Iska (iska@catposter.club)'s status on Friday, 30-Aug-2024 20:38:45 JST Iska
  in reply to
  @icedquinn@blob.cat @iska@catposter.club @Reiddragon@fedi.reimu.info @lispi314@udongein.xyz xz is lzma, iirc apt or guix might still be using it
  
  In conversation about 3 months ago permalink
- Embed this notice
  翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Friday, 30-Aug-2024 21:37:10 JST 翠星石
  in reply to
  - Iska
  @iska >since every computer has at least 3 versions of chrome installed.
  Mine doesn't.
  
  In conversation about 3 months ago permalink
- Embed this notice
  Iska (iska@catposter.club)'s status on Friday, 30-Aug-2024 21:37:12 JST Iska
  in reply to
  - Iska
  @lispi314@udongein.xyz @iska@catposter.club the storage ends up as the bottleneckWhat a load of crap. I just SSHd into my server to benchmark. Zipping the linux source takes 39 seconds, making a tar takes 11, bottlenecked by verbose output to mirror zip (1.6 sec without)
  
  Only decompression is close, at 13.5 seconds vs 10. Interpreters will be one or two orders of magnitude slower, so go figure.
  
  Even if you make the argument of availability, you can use a jit-compiled javascript program, since every computer has at least 3 versions of chrome installed.
  
  In conversation about 3 months ago permalink
- Embed this notice
  LisPi (lispi314@udongein.xyz)'s status on Friday, 30-Aug-2024 21:37:14 JST LisPi
  in reply to
  - Iska
  @iska Literally why? Even if you implemented the decoder in Bash it would still be the storage that ends up as the bottleneck unless you're rich and buying enterprise SSDs for mass storage (ridiculous).
  
  A few of Common Lisp implementations are essentially interpreted, too.
  
  Python is notable in this case because it's "common-enough it's basically litter lying discarded on the ground".
  
  Pure CL implementations of common compressed archive formats actually seem to be lagging behind.
  
  In conversation about 3 months ago permalink
- Embed this notice
  iced depresso (icedquinn@blob.cat)'s status on Friday, 30-Aug-2024 23:07:33 JST iced depresso
  in reply to
  @mooncorebunny @iska @Reiddragon @lispi314 i looked at the code for zstd and it looked big. but there is also an ietf spec for the algorithm if i'm not mistaken. it might be very possible to reimplement (facebook wanted it to be a spec)
  
  the reference implementation does a lot of things that you may never need, like some of the parallel / syncable encodings.
  
  in the end its all LZ with a couple bits of tuning. like orz is RoLZ in to Huffman. RoLZ in to rANS (arithmetic encoder) is probably good too, since ANS approximates huffman.
  
  In conversation about 3 months ago permalink
- Embed this notice
  Kirinn B. (mooncorebunny@social.linux.pizza)'s status on Friday, 30-Aug-2024 23:07:35 JST Kirinn B.
  in reply to
  @icedquinn @iska @Reiddragon @lispi314 I hold the view that ZStd is too complicated to count as a real "standard"; the zstd library is an order of magnitude larger than libdeflate, so the algorithm is very difficult to independently re-implement.
  Therefore I'm very happy to learn about ROLZ! I may want to try re-implementing that.
  I wonder if retrofitting ROLZ in place of LZ in DEFLATE would produce even better results... and, to speed things up, replace the one or two Huffman stages with FSE, which is after all the primary innovation enabling ZStd.
  
  In conversation about 3 months ago permalink

Public

Conversation

Notices

Feeds