@inthehands There was a good analysis by @randomwalker that argued why such tests don't mean much when applied to AI (in addition to the fact that the tests themselves, even when applied to people, aren't exactly what they claim to be):
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks