(Disclaimer: slightly out of my depth)
One thing researchers who were seriously studying LLMs pre-hypestravaganza were really excited about is the progression from completely bespoke model training (prep your problem-specific training data and train from scratch) to “transfer learning” (take a generic model and do a little extra training to specialize it, eg generic species classifier tuned to recognize birds) to “zero-shot classifiers” where you get a completely generic model to focus on a specific thing in the query itself, no additional training. Serious people treat this kind of usage with the same empirical caution they’d have applied to any other classifier system.
This really can work for the kind of use case Cat was talking about (slight increase in error rate for a huge increase in throughput), and in that usage pattern, LLMs are very much in the family of old-school classifiers.
Jeremy, I’d argue the difference you’re talking about isn’t just in the model itself, but in how it’s used, how it’s presented, how the UI works, and how it’s marketed. When you change the mindset from “zero-shot classifiers with flexible queries” to “prompt engineering,” you have substantively changed the nature of the beast.