Newer LLM training data unavoidably includes LLM output in what they scraped off the internet, which is like sticking a microphone into a speaker. A fairly small amount poisons the process and the math collapses, making newer models are worse than old ones, despite throwing more compute at the problem.
Plus each query costs multiple dollars to answer: humans are so much cheaper AI companies have cheated and employed them (ala the mechanical turk).
https://www.pcmag.com/news/this-companys-ai-was-really-just-remote-human-workers-pushing-buttons