Untitled attachment

Notices where this attachment appears

Carl T. Bergstrom (ct_bergstrom@fediscience.org)'s status on Wednesday, 12-Apr-2023 09:29:34 JST

Yes, you can #jailbreak #ChatGPT and get it to say things that it doesn't usually otherwise say.

But I'm baffled at how many people are doing jailbreak experiments with the impression that they're learning about what the #LLMs *really* thinks or what it's *really* doing on the inside.

To illustrate, I've slightly tweaked one of the classic jailbreak scripts https://www.reddit.com/r/GPT_jailbreaks/comments/1164aah/chatgpt_developer_mode_100_fully_featured_filter/ and unleashed Stochastic Crow Mode.

Do you think you learn much about its inner workings from this?

Public

Notices where this attachment appears