Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://mastodon.social/users/hyc/statuses/115917325420175430">Howard Chu @ Symas (hyc@mastodon.social)'s status on Monday, 19-Jan-2026 03:37:52 JST</a><a href="https://mastodon.social/@hyc" title="hyc@mastodon.social"><img src="https://gnusocial.jp/avatar/29613-48-20221117233524.webp" width="48" height="48" alt="Howard Chu @ Symas" style="position: absolute; left: 0; top: 0;">Howard Chu @ Symas</a></section><article><p>RE: <a href="https://c.im/@cdarwin/115914320802090355" rel="nofollow">https://c.im/@cdarwin/115914320802090355</a></p><p>Duhhh...</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/6070656#notice-11955512">In conversation</a><time datetime="2026-01-19T03:37:52+09:00" title="Monday, 19-Jan-2026 03:37:52 JST">about 3 months ago</time> <span>from <span><a href="https://mastodon.social/@hyc/115917325420175430" rel="external" title="Sent from mastodon.social via ActivityPub">mastodon.social</a></span></span><a href="https://mastodon.social/@hyc/115917325420175430">permalink</a><h4>Attachments</h4><ol><li><article><header><div>No result found on File_thumbnail lookup.</div><h5><a href="https://c.im/@cdarwin/115914320802090355Duhhh">Chuck Darwin (@cdarwin@c.im)</a></h5><div> from <span>Chuck Darwin</span></div></header><div>A damning new study could put AI companies on the defensive. In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data, 
not “learning” from it. Specifically, four prominent LLMs
 — OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet 
— happily reproduced lengthy excerpts from popular 
— and protected 
— works, with a stunning degree of accuracy.They found that Claude outputted “entire books near-verbatim” with an accuracy rate of 95.8 percent. Gemini reproduced the novel “Harry Potter and the Sorcerer’s Stone” with an accuracy of 76.8 percent, while Claude reproduced George Orwell’s “1984” with a higher than 94 percent accuracy compared to the original 
— and still copyrighted 
— reference material.“While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models,” 
the researchers wrote.Some of these reproductions required the researchers to jailbreak the models with a technique called "Best-of-N", 
which essentially bombards the AI with different iterations of the same prompt. (Those kinds of workarounds have already been used by OpenAI to defend itself in a lawsuit filed by the New York Times, 
with its lawyers arguing that “normal people do not use OpenAI’s products in this way.”)The implications of the latest findings could be substantial 
as copyright lawsuits play out in courts across the country. As The Atlantic‘s Alex Reisner points out, 
the results further undermine the AI industry’s argument that LLMs “learn” from these texts 
-- instead of storing information and recalling it later. It’s evidence that “may be a massive legal liability for AI companies” 
and “potentially cost the industry billions of dollars in copyright-infringement judgmentshttps://futurism.com/artificial-intelligence/ai-industry-recall-copyright-books</div><footer></footer></article></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Howard Chu @ Symas (hyc@mastodon.social)'s status on Monday, 19-Jan-2026 03:37:52 JST Howard Chu @ Symas
RE: https://c.im/@cdarwin/115914320802090355
Duhhh...
In conversationabout 3 months ago from mastodon.socialpermalink
Attachments
1. No result found on File_thumbnail lookup.
  Chuck Darwin (@cdarwin@c.im)
  from Chuck Darwin
  A damning new study could put AI companies on the defensive. In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data, not “learning” from it. Specifically, four prominent LLMs — OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet — happily reproduced lengthy excerpts from popular — and protected — works, with a stunning degree of accuracy. They found that Claude outputted “entire books near-verbatim” with an accuracy rate of 95.8 percent. Gemini reproduced the novel “Harry Potter and the Sorcerer’s Stone” with an accuracy of 76.8 percent, while Claude reproduced George Orwell’s “1984” with a higher than 94 percent accuracy compared to the original — and still copyrighted — reference material. “While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models,” the researchers wrote. Some of these reproductions required the researchers to jailbreak the models with a technique called "Best-of-N", which essentially bombards the AI with different iterations of the same prompt. (Those kinds of workarounds have already been used by OpenAI to defend itself in a lawsuit filed by the New York Times, with its lawyers arguing that “normal people do not use OpenAI’s products in this way.”) The implications of the latest findings could be substantial as copyright lawsuits play out in courts across the country. As The Atlantic‘s Alex Reisner points out, the results further undermine the AI industry’s argument that LLMs “learn” from these texts -- instead of storing information and recalling it later. It’s evidence that “may be a massive legal liability for AI companies” and “potentially cost the industry billions of dollars in copyright-infringement judgments https://futurism.com/artificial-intelligence/ai-industry-recall-copyright-books

Public

Embed Notice

HTML Code

Corresponding Notice