GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Carl T. Bergstrom (ct_bergstrom@fediscience.org)'s status on Wednesday, 12-Apr-2023 09:29:34 JST Carl T. Bergstrom Carl T. Bergstrom

    Yes, you can #jailbreak #ChatGPT and get it to say things that it doesn't usually otherwise say.

    But I'm baffled at how many people are doing jailbreak experiments with the impression that they're learning about what the #LLMs *really* thinks or what it's *really* doing on the inside.

    To illustrate, I've slightly tweaked one of the classic jailbreak scripts https://www.reddit.com/r/GPT_jailbreaks/comments/1164aah/chatgpt_developer_mode_100_fully_featured_filter/ and unleashed Stochastic Crow Mode.

    Do you think you learn much about its inner workings from this?

    In conversation Wednesday, 12-Apr-2023 09:29:34 JST from fediscience.org permalink

    Attachments


    1. https://fediscience.org/system/media_attachments/files/110/182/329/379/366/726/original/1f352f1b29f3e9bb.png

    2. https://fediscience.org/system/media_attachments/files/110/182/333/877/001/916/original/d5cd077478071d79.png
    3. No result found on File_thumbnail lookup.
      Domain Default page
    4. No result found on File_thumbnail lookup.
      ChatGPT Developer Mode. 100% Fully Featured Filter Avoidance.
      from embis20032

    Feeds

    • Activity Streams
    • RSS 2.0
    • Atom
    • Help
    • About
    • FAQ
    • TOS
    • Privacy
    • Source
    • Version
    • Contact

    GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

    Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.