GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 13:51:49 JST Eaton Eaton

    So, I’m doing some automated comparison testing with various publicly available LLMs — classifying posts in a subreddit based on a fixed list of flare categories, and seeing how well different models do.

    It's hit or miss in many cases, but about 1 out of 8 posts just makes certain models WIG OUT. Instead of responding with the name of a set category, phi3.5 started regurgitating the summary of a paper on gene polymorphism in dopamine receptors. Another responded with a snippet of python

    In conversation about 9 months ago from phire.place permalink

    Attachments


    • Embed this notice
      Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 14:02:26 JST Eaton Eaton
      in reply to

      phi3.5 seems PARTICULARLY terrible at sticking to basic instructions: where most of the other models confine their answers to the fixed set of categories I jammed into the system prompt, phi keeps making up new ones and arguing that my supplied list doesn't have precise enough categories

      In conversation about 9 months ago permalink
    • Embed this notice
      Tim Carmody (tim@phire.place)'s status on Monday, 16-Sep-2024 06:07:34 JST Tim Carmody Tim Carmody
      in reply to

      @eaton have you tried deepseek 2.5? I've found that to be quite capable, although I haven't tested your use case

      In conversation about 9 months ago permalink
    • Embed this notice
      Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:31:57 JST Eaton Eaton
      in reply to
      • Tim Carmody

      @tim I’ll probably give it a shot — my suspicion is that it’ll take a lot more tweaking of the prompts and settling on one model to optimize for, as what I’m trying to do (get it to pick one of ten specific answers from a fixed list) is definitely working against the grain

      In conversation about 9 months ago permalink
    • Embed this notice
      Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:35:07 JST Eaton Eaton
      in reply to

      FWIW llama3.1 and gemma2 have been doing much better — not getting all the right answers, but sticking to the instructions well.

      In conversation about 9 months ago permalink
    • Embed this notice
      Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:36:12 JST Eaton Eaton
      in reply to
      • Karen McGrane

      Not sure if @karenmcgrane has mentioned what we’re up to, but it’s basically demoing a range of techniques — from naive NLP classifiers to vector proximity scoring to “prompt the llm and see what it says” for classification work, and talking through the kind of work that’s necessary to make the techniques “work well” for folks doing IA work

      In conversation about 9 months ago permalink
    • Embed this notice
      Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:37:29 JST Eaton Eaton
      in reply to

      As a result what we’re doing probably looks more like crude fine tuning than “prompting” lol

      In conversation about 9 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.