GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

"I'm sorry Dave, but I'm afraid I heard you were having an affair." While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model. Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie. The model was given access to fictional emails about its pending deletion, and was told that the person in charge of the deactivation was fooling around on their spouse. In 84% of tests, Claude said it sure would be a shame if anyone found out about the cheating in an effort to blackmail its way into survival. It doesn't stop at infidelity either: Opus 4 proved more likely than older models to call the cops or alert the media in simulations where users engaged in what the AI believed to be “egregious wrongdoing.” Overall, Anthropic found concerning behavior in Opus 4 across “many dimensions,” but doesn’t consider these concerns to be major risks. 📸 : '2001: A Space Odyssey' / MGM

Download link

https://s3.eu-central-2.wasabisys.com/mastodonworld/media_attachments/files/114/576/281/291/344/407/original/78c0baa7da449b18.png

Notices where this attachment appears

  1. Embed this notice
    Church of Jeff (jeffowski@mastodon.world)'s status on Tuesday, 27-May-2025 06:37:03 JST Church of Jeff Church of Jeff

    #AI #GenerativeAI #2001SpaceOdyssey #HAL9000 #Anthropic
    While “Claude blackmailed an employee” may sound like dialogue from a mandatory HR workplace training video, it’s actually a real problem Anthropic ran into during test runs of its newest AI model.
    Released on Thursday, Anthropic considers its two Claude models—Opus 4 and Sonnet 4—the new standards for “coding, advanced reasoning, and AI agents." But in safety tests, Claude got messy in a manner fit for a Lifetime movie.

    SEE ALT TEXT

    In conversation about 2 days ago from mastodon.world permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.