Conversation

Notices

Embed this notice
Aether ??? (aether@poa.st)'s status on Monday, 19-Jun-2023 19:27:27 JST Aether ???

GPT-4 can pass MIT's Electrical Engineering and Computer Science curriculum with a perfect score.

huggingface.co/papers/2306.08997

A remarkable result.

Except it fucking can't.

flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864

The second paper highlights two problems with the first.

1. - 4% of the problems in the test set cannot be solved with the information provided, or in some cases, at all:

>Below you are given the delays for the different gates you were permitted to use in part D above. Compute the propagation delay of your circuit from D.

That's the entire question. There is no part D above, yet the claim is that GPT-4 answered this question correctly. There are many questions like this in the test set - this second paper links to a spreadsheet with the complete list of questions, good and bad.

2. - the answers provided by GPT-4 are scored by GPT-4. If GPT-4 tells GPT-4 that GPT-4 got the question wrong, GPT-4 gets to try again indefinitely.

Supposedly the answers were verified manually, but if so, they did a poor job because they missed all the wrong questions.

3. - not included in the paper, but posted today on Twitter, the original code used to run the tests leaks the answers used for verification by GPT-4 to the GPT-4 instance answering the questions.

Oops.
In conversation Monday, 19-Jun-2023 19:27:27 JST from poa.st permalink
Attachments
1. Domain not in remote thumbnail source whitelist: cdn-thumbnails.huggingface.co
  
  Paper page - Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
  
  Join the discussion on this paper page
- Embed this notice
  Sir Nedwood - Sydney 🇦🇺 (ned@noagendasocial.com)'s status on Monday, 19-Jun-2023 19:27:26 JST Sir Nedwood - Sydney 🇦🇺
  in reply to
  
  @Aether sound's like how the covid vaccines were tested.
  
  In conversation Monday, 19-Jun-2023 19:27:26 JST permalink
- Embed this notice
  ?? Humpleupagus ?? (humpleupagus@eveningzoo.club)'s status on Monday, 19-Jun-2023 20:17:39 JST ?? Humpleupagus ??
  in reply to
  
  I just took an IQ test, and I got 100% of the right questions right. I'm a fucking genius.
  
  In conversation Monday, 19-Jun-2023 20:17:39 JST permalink

Public

Conversation

Notices

Feeds