Forgive me if I'm mistake but... Chain of Thought models are not actually the models thinking as far as I understand? The CoT is merely the LLM speaking to itself within an incentivized environment to achieve specific outcomes. It still works purely on the tokenizer and next most probable token logic. I wish people who barely understand AI would stop writing about it...
When AI Models Are Pressured to 'Behave' They Scheme in Private, Just like Us: OpenAI - Decrypt