Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
Aether ??? (aether@poa.st)'s status on Monday, 19-Jun-2023 19:27:27 JSTAether ??? GPT-4 can pass MIT's Electrical Engineering and Computer Science curriculum with a perfect score.
huggingface.co/papers/2306.08997
A remarkable result.
Except it fucking can't.
flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864
The second paper highlights two problems with the first.
1. - 4% of the problems in the test set cannot be solved with the information provided, or in some cases, at all:
>Below you are given the delays for the different gates you were permitted to use in part D above. Compute the propagation delay of your circuit from D.
That's the entire question. There is no part D above, yet the claim is that GPT-4 answered this question correctly. There are many questions like this in the test set - this second paper links to a spreadsheet with the complete list of questions, good and bad.
2. - the answers provided by GPT-4 are scored by GPT-4. If GPT-4 tells GPT-4 that GPT-4 got the question wrong, GPT-4 gets to try again indefinitely.
Supposedly the answers were verified manually, but if so, they did a poor job because they missed all the wrong questions.
3. - not included in the paper, but posted today on Twitter, the original code used to run the tests leaks the answers used for verification by GPT-4 to the GPT-4 instance answering the questions.
Oops.