@tess did a big comparison between document categorization approaches with dozens of models and techniques. without doing any particular tailoring or optimization, simple kmeans proximity with embeddings scored within 10% of the best prompt based llm approaches. Orders of magnitude faster and more energy efficient, too. It’s hard to imagine why one wouldn’t start by optimizing that to improve results, rather than endless prompt fiddling.. but admittedly it took relearning some tangly math