GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Notices by lain (lain@fediffusion.art), page 2

  1. Embed this notice
    lain (lain@fediffusion.art)'s status on Thursday, 20-Jun-2024 21:22:11 JST lain lain
    > Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechanisms remain poorly understood. In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. Specifically, for each model, we find a single direction such that erasing this direction from the model's residual stream activations prevents it from refusing harmful instructions, while adding this direction elicits refusal on even harmless instructions.

    https://arxiv.org/abs/2406.11717
    In conversation about a year ago from fediffusion.art permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: arxiv.org
      Refusal in Language Models Is Mediated by a Single Direction
      Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechanisms remain poorly understood. In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. Specifically, for each model, we find a single direction such that erasing this direction from the model's residual stream activations prevents it from refusing harmful instructions, while adding this direction elicits refusal on even harmless instructions. Leveraging this insight, we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities. Finally, we mechanistically analyze how adversarial suffixes suppress propagation of the refusal-mediating direction. Our findings underscore the brittleness of current safety fine-tuning methods. More broadly, our work showcases how an understanding of model internals can be leveraged to develop practical methods for controlling model behavior.
  2. Embed this notice
    lain (lain@fediffusion.art)'s status on Friday, 17-May-2024 22:23:18 JST lain lain
    I have noticed that, somewhat contrary to what I would have expected, religious / spiritual people have little problems with LLMs (in fact, they have a lot of 'discussions with chatgpt about faith' podcasts out), while the people who are deadly afraid of it and think it will be the downfall of society are overwhelmingly materialist progressive types. I have some ideas about this but for now it's just an observation.
    In conversation about a year ago from fediffusion.art permalink
  3. Embed this notice
    lain (lain@fediffusion.art)'s status on Monday, 11-Mar-2024 22:16:55 JST lain lain
    in reply to
    • kaia
    @kaia I think he really mostly wants AI to be open. But we'll see what 'open sourcing' actually means.
    In conversation Monday, 11-Mar-2024 22:16:55 JST from gnusocial.jp permalink
  4. Embed this notice
    lain (lain@fediffusion.art)'s status on Monday, 11-Mar-2024 21:30:51 JST lain lain
    👀
    In conversation Monday, 11-Mar-2024 21:30:51 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/c6655bc1ea1dad2ab3dc789c7638617c30b0e7e37b8b22f0729e2d217550deb7.png
  5. Embed this notice
    lain (lain@fediffusion.art)'s status on Friday, 09-Feb-2024 23:38:20 JST lain lain
    in reply to
    • kaia
    • Moon
    @Moon @kaia it's still crazy what we can do right now with local models, and you haven't even finetuned it, just prompt engineering.
    In conversation Friday, 09-Feb-2024 23:38:20 JST from fediffusion.art permalink
  6. Embed this notice
    lain (lain@fediffusion.art)'s status on Tuesday, 06-Feb-2024 19:40:25 JST lain lain
    • Big Richard
    And the results are in!

    The winner of last week's image generation contest is... @Big_Richard !!

    Thanks everyone who participated, hope you'll be joining us again for the next one!

    You can see the poll here: https://fediffusion.art/notice/AbKn4V4OymI7E4kykS
    In conversation Tuesday, 06-Feb-2024 19:40:25 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/822601ea02d720ce5e22568ede71f5c7a44c4c0a7c4c78da68d6539cb0ecf41f.png
    2. Domain not in remote thumbnail source whitelist: fediffusion.art
      lain (@lain@fediffusion.art)
      Hello everyone! Here are the submissions for our first contest! The theme this week was the classic of image generation, the 1girl! Very varied, also check out the original post for comments by th...
  7. Embed this notice
    lain (lain@fediffusion.art)'s status on Wednesday, 31-Jan-2024 18:04:36 JST lain lain
    the new llava release is 🔥
    In conversation Wednesday, 31-Jan-2024 18:04:36 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/0c656f47976e81f0f308b5a37260d58bd3da414c7a89e977afbab1694013943a.png

    2. https://fediffusion.art/media/aa771da42a8b94d0a3daa408cfefe5e790251ea14a6d65516143f11a7a8c06a6.png

    3. https://fediffusion.art/media/cd3758958eb948e57034bbcc8f01e359b43b9cd83968ab4a35f64480605c5bfd.png
  8. Embed this notice
    lain (lain@fediffusion.art)'s status on Tuesday, 30-Jan-2024 04:08:34 JST lain lain
    what's the songlist?
    In conversation Tuesday, 30-Jan-2024 04:08:34 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/ea02c8fd6605ac56dcc9388f0301bc1a24a577c3a27c3610f381272183389e00.png
  9. Embed this notice
    lain (lain@fediffusion.art)'s status on Saturday, 27-Jan-2024 23:54:34 JST lain lain
    • 受不了包
    @shibao you and me both buddy
    In conversation Saturday, 27-Jan-2024 23:54:34 JST from fediffusion.art permalink
  10. Embed this notice
    lain (lain@fediffusion.art)'s status on Sunday, 07-Jan-2024 03:51:06 JST lain lain
    > https://huggingface.co/microsoft/phi-2/commit/7e10f3ea09c0ebd373aebc73bc6e6ca58204628d

    What are they all doing? Is this all safeguarding against regulation? Make it all free and open before it can be banned?
    In conversation Sunday, 07-Jan-2024 03:51:06 JST from fediffusion.art permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: cdn-thumbnails.huggingface.co
      Upload 3 files · microsoft/phi-2 at 7e10f3e
      We’re on a journey to advance and democratize artificial intelligence through open source and open science.
  11. Embed this notice
    lain (lain@fediffusion.art)'s status on Tuesday, 02-Jan-2024 21:03:41 JST lain lain
    • Anime Wong
    • Guizzy
    • kaiaskutes
    Whoops! new year happened and I missed the end of the vote!

    We have three winners this time with a three-way tie of 6 votes each, @guizzy, @kaiaskutes and @Elliptica, congratulations! Thank you all for participating, let's get some good generations going in 2024!

    RT: https://fediffusion.art/objects/a5b51368-1518-4795-bbdc-ae254baae4ea
    In conversation Tuesday, 02-Jan-2024 21:03:41 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/7cc35c55b31088fb13db0b03252bde727c916f13d651e256b4dec53d6e382cd5.webp

    2. https://fediffusion.art/media/4dab70b26ecd4d76e699013b8e544e7508dc61fee570bfbbdf3bee2f3178210b.png

    3. https://fediffusion.art/media/8341dd69cd8bfa08bf3c72e7f20b8460340332f505bc4b4f787c07084e03b412.png

  12. Embed this notice
    lain (lain@fediffusion.art)'s status on Sunday, 24-Dec-2023 03:31:02 JST lain lain
    in reply to
    • lhl
    @lhl ??? How does this work?
    In conversation Sunday, 24-Dec-2023 03:31:02 JST from fediffusion.art permalink
  13. Embed this notice
    lain (lain@fediffusion.art)'s status on Wednesday, 20-Dec-2023 07:36:17 JST lain lain
    > I expect the only people who are nonplussed by the power of LLMs are those with a soft spot for occultism of some sort—those who think words are magical. Let me explain.
    > Let me repeat: there is so much abstract structure in our language—the patterns are so overwhelmingly clear, consistent, and objective—that by mindlessly figuring out the probability of one symbol following another, a machine can effectively reason better than the average person for a large number of cases.

    https://steve-patterson.com/why-language-machines-do-not-have-souls/
    In conversation Wednesday, 20-Dec-2023 07:36:17 JST from fediffusion.art permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: steve-patterson.com
      Why Language Machines do not have Souls
      from Steve Patterson
      It’s been nine months since GPT4 was released. I’m still trying to make sense of things. There’s a dearth of level-headed analysis out there. Most people’s analysis seems to be framed by science fiction novels, or they are still using frameworks inherited from the pre-GPT world, which did not anticipate the success of LLMs. Even …
  14. Embed this notice
    lain (lain@fediffusion.art)'s status on Wednesday, 20-Dec-2023 07:36:15 JST lain lain
    in reply to
    • Ruru! 🦉
    @lonelyowl i mean, yeah, after seeing a machine do it, it's "no surprise", but it sure was a surprise before it happened
    In conversation Wednesday, 20-Dec-2023 07:36:15 JST from fediffusion.art permalink
  15. Embed this notice
    lain (lain@fediffusion.art)'s status on Monday, 18-Dec-2023 21:52:47 JST lain lain
    Reminder that we still have a CONTEST going on! Because I'm traveling, the deadline is extended to FRIDAY, the 22nd.

    Get your entries in before it's too late!

    RT: https://fediffusion.art/objects/a597ce7e-5ebf-40ec-8242-d15241ebdd2a
    In conversation Monday, 18-Dec-2023 21:52:47 JST from fediffusion.art permalink

    Attachments


  16. Embed this notice
    lain (lain@fediffusion.art)'s status on Sunday, 10-Dec-2023 02:28:38 JST lain lain
    in reply to
    • Sexy Moon
    • lain
    please contribute with your resurrected 4090 power, @Moon
    In conversation Sunday, 10-Dec-2023 02:28:38 JST from fediffusion.art permalink
  17. Embed this notice
    lain (lain@fediffusion.art)'s status on Saturday, 09-Dec-2023 20:12:29 JST lain lain
    NEW CONTEST!

    Once again we're doing a week-long AI image creation contest! This time the topic is:

    STORY ILLUSTRATIONS

    Ever read a story and imagined what the scene would look like? Well, now you can show it to all of us! Pick a scene from any story or novel you like and create an image of it. Please tell us which story you are taking inspiration from!

    The voting will start one week from now, so get your entries in before that.

    Here's an example: A scene from the Yasutaka Tsutsui story "Standing Woman".
    In conversation Saturday, 09-Dec-2023 20:12:29 JST from fediffusion.art permalink

    Attachments


    1. https://fediffusion.art/media/911f94c10ae19df62d92711687ffb893383b01e2ca674e9a20b545839b817881.png
  18. Embed this notice
    lain (lain@fediffusion.art)'s status on Wednesday, 06-Dec-2023 01:21:15 JST lain lain
    if a 30gb thinking file revealed mitsu's pregnancy i'll literally chuckle
    In conversation Wednesday, 06-Dec-2023 01:21:15 JST from fediffusion.art permalink
  19. Embed this notice
    lain (lain@fediffusion.art)'s status on Wednesday, 06-Dec-2023 01:21:14 JST lain lain
    in reply to
    • Guizzy
    @guizzy better than going to a doctor in canada, i heard...
    In conversation Wednesday, 06-Dec-2023 01:21:14 JST from fediffusion.art permalink
  20. Embed this notice
    lain (lain@fediffusion.art)'s status on Saturday, 02-Dec-2023 02:05:15 JST lain lain
    in reply to
    • ロミンちゃん
    @romin they are releasing so many AI tools all the time, it's crazy.
    In conversation Saturday, 02-Dec-2023 02:05:15 JST from fediffusion.art permalink
  • After
  • Before

User actions

    lain

    lain

    Tags
    • (None)

    Following 0

      Followers 0

        Groups 0

          Statistics

          User ID
          203505
          Member since
          23 Oct 2023
          Notices
          80
          Daily average
          0

          Feeds

          • Atom
          • Help
          • About
          • FAQ
          • TOS
          • Privacy
          • Source
          • Version
          • Contact

          GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

          Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.