@thomasfuchs @lisamelton Citation needed. The differences between GPT generations have been qualitative so far. More blocks and larger context windows mean more abstract features and more state. Don’t know how you conclude that won’t improve quality. The “tech bro” conceit is “scaling is _all_ you need,” following “The Bitter Lesson” argument: meta-methods that can find and capture complexity > methods inserted manually. Don’t know any studies that strongly refute this.
https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf