The statistical patterns of which words and phrases tend to "go together" isn't the same thing as the meaning of those words.
Because of this, LLMs struggle with things like "What is the sum of seven and 3?"
To an LLM, "seven" is more statically clustered with "forty" than it is with 10.
It's possible to brute force these errors to be less frequent, but there is nothing in the way these systems are structured that processes counting or numbers as well as, say, a crow might.