Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Screencap from linked article, reading "OpenAI said the 250,000-word casebook used for the study was more than twice the length of text that its GPT-4o model can process at once. Anthropic said the study had limited usefulness because it did not compare the A.I. with human performance. Google said its model accuracy had improved since the study was conducted."

Download link

Screencap from linked article, reading "OpenAI said the 250,000-word casebook used for the study was more than twice the length of text that its GPT-4o model can process at once. Anthropic said the study had limited usefulness because it did not compare the A.I. with human performance. Google said its model accuracy had improved since the study was conducted."
https://cdn.masto.host/daircommunitysocial/media_attachments/files/114/643/154/566/951/555/original/5393da3d7474f6ac.png

Notices where this attachment appears

Embed this notice
Prof. Emily M. Bender(she/her) (emilymbender@dair-community.social)'s status on Sunday, 08-Jun-2025 02:04:42 JST Prof. Emily M. Bender(she/her)
in reply to

So the NYT then quotes, wtih an apparent straight face, the various excuses the model providers have for their systems getting it wrong:

In conversation about a year ago from dair-community.social permalink