Fun paper that shows that gzip-based kNN sentence classification can outperform BERT.
https://aclanthology.org/2023.findings-acl.426/
Understanding that every predictor is a compressor and vice versa was the most important insight I learnt since 2018.
Also: Concat/compress with a SotA compressor is a great way to measure similarity between two pieces of data - an old insight, as Steph Wehner did gzip based malware clustering in the early 2000s...