@nikatjef @WhyNotZoidberg @root42 @Tattooed_Mummy I suspect if you go back and check your data points you'll find they're misleading.
Acing exams is actually a really good case for testing our assumptions about these systems.
After all, the whole idea is that they have access to the entire internet. How could they do anything but ace the tests.
It turns out that LLMs can sometimes do kind-of-ok on tests given a very carefully constructed environment.
So what gives?