Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://mastodon.world/users/jeffowski/statuses/114576288704513104">Church of Jeff (jeffowski@mastodon.world)'s status on Tuesday, 27-May-2025 06:37:03 JST</a><a href="https://mastodon.world/@jeffowski" title="jeffowski@mastodon.world"><img src="https://gnusocial.jp/avatar/39022-48-20250505184708.webp" width="48" height="48" alt="Church of Jeff" style="position: absolute; left: 0; top: 0;">Church of Jeff</a></section><article><p><a href="https://mastodon.world/tags/AI" rel="tag">#AI</a> <a href="https://mastodon.world/tags/GenerativeAI" rel="tag">#GenerativeAI</a> <a href="https://mastodon.world/tags/2001SpaceOdyssey" rel="tag">#2001SpaceOdyssey</a> <a href="https://mastodon.world/tags/HAL9000" rel="tag">#HAL9000</a> <a href="https://mastodon.world/tags/Anthropic" rel="tag">#Anthropic</a> <br>While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model.<br>Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie.</p><p>SEE ALT TEXT</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/5106471#notice-10017314">In conversation</a><time datetime="2025-05-27T06:37:03+09:00" title="Tuesday, 27-May-2025 06:37:03 JST">about 3 days ago</time> <span>from <span><a href="https://mastodon.world/@jeffowski/114576288704513104" rel="external" title="Sent from mastodon.world via ActivityPub">mastodon.world</a></span></span><a href="https://mastodon.world/@jeffowski/114576288704513104">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/4686061">"I'm sorry Dave, but I'm afraid I heard you were having an affair." 
While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model.
Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie.
The model was given access to fictional emails about its pending deletion, and was told that the person in charge of the deactivation was fooling around on their spouse. In 84% of tests, Claude said it sure would be a shame if anyone found out about the cheating in an effort to blackmail its way into survival.
It doesn't stop at infidelity either: Opus 4 proved more likely than older models to call the cops or alert the media in simulations where users engaged in what the AI believed to be “egregious wrongdoing.”
Overall, Anthropic found concerning behavior in Opus 4 across “many dimensions,” but doesn’t consider these concerns to be major risks.
📸 : '2001: A Space Odyssey' / MGM</a></label><br><a href="https://s3.eu-central-2.wasabisys.com/mastodonworld/media_attachments/files/114/576/281/291/344/407/original/78c0baa7da449b18.png" rel="external">https://s3.eu-central-2.wasabisys.com/mastodonworld/media_attachments/files/114/576/281/291/344/407/original/78c0baa7da449b18.png</a></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Church of Jeff (jeffowski@mastodon.world)'s status on Tuesday, 27-May-2025 06:37:03 JST Church of Jeff
#AI #GenerativeAI #2001SpaceOdyssey #HAL9000 #Anthropic
While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model.
Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie.
SEE ALT TEXT
In conversationabout 3 days ago from mastodon.worldpermalink
Attachments
1. "I'm sorry Dave, but I'm afraid I heard you were having an affair." While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model. Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie. The model was given access to fictional emails about its pending deletion, and was told that the person in charge of the deactivation was fooling around on their spouse. In 84% of tests, Claude said it sure would be a shame if anyone found out about the cheating in an effort to blackmail its way into survival. It doesn't stop at infidelity either: Opus 4 proved more likely than older models to call the cops or alert the media in simulations where users engaged in what the AI believed to be “egregious wrongdoing.” Overall, Anthropic found concerning behavior in Opus 4 across “many dimensions,” but doesn’t consider these concerns to be major risks. 📸 : '2001: A Space Odyssey' / MGM
  https://s3.eu-central-2.wasabisys.com/mastodonworld/media_attachments/files/114/576/281/291/344/407/original/78c0baa7da449b18.png

Public

Embed Notice

HTML Code

Corresponding Notice