@thomasfuchs from the studies that I've been seeing roll out the last little while, it seems more that they're better at enabling Dunning-Kruger effects than anything.
@thomasfuchs One thing I think about is separating the "knowledge" we expect from an LLM from the "language processing". I think the former will do as you say, but the latter can improve. So, things like the ability to translate from one language to another, or from speech to text, will get better.
@thomasfuchs We all agree "garbage in, garbage out" here. I also agree there probably is a turning point in 2023 where the inputs can't be assumed to be human generated. I've been assuming the primary symptom will be content-based.
@thomasfuchs How do we know this? What starts the garbage out feedback loop if we have an evaluation system that works? Do we have an evaluation system that works? Do we have an eval system. What is ML eval.
@thomasfuchs The architects of these systems no doubt know this about them. I'm curious if there's not an ulterior motive behind the big push to install the LLMs everywhere.
Is this effectively a second phase of training? We taught it how we write with the initial canon and now we're teaching it to parse natural language engrams through user interaction with that canon.
It won't get better at generating accurate outputs but it might get better at understanding the prompts.
@thomasfuchs@lisamelton Citation needed. The differences between GPT generations have been qualitative so far. More blocks and larger context windows mean more abstract features and more state. Don’t know how you conclude that won’t improve quality. The “tech bro” conceit is “scaling is _all_ you need,” following “The Bitter Lesson” argument: meta-methods that can find and capture complexity > methods inserted manually. Don’t know any studies that strongly refute this.