Conversation

Notices

Embed this notice
Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 13:51:49 JST Eaton

So, I’m doing some automated comparison testing with various publicly available LLMs — classifying posts in a subreddit based on a fixed list of flare categories, and seeing how well different models do.
It's hit or miss in many cases, but about 1 out of 8 posts just makes certain models WIG OUT. Instead of responding with the name of a set category, phi3.5 started regurgitating the summary of a paper on gene polymorphism in dopamine receptors. Another responded with a snippet of python
In conversation about 9 months ago from phire.place permalink
Attachments
1. Untitled attachment
- Embed this notice
  Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 14:02:26 JST Eaton
  in reply to
  
  phi3.5 seems PARTICULARLY terrible at sticking to basic instructions: where most of the other models confine their answers to the fixed set of categories I jammed into the system prompt, phi keeps making up new ones and arguing that my supplied list doesn't have precise enough categories
  
  In conversation about 9 months ago permalink
- Embed this notice
  Tim Carmody (tim@phire.place)'s status on Monday, 16-Sep-2024 06:07:34 JST Tim Carmody
  in reply to
  
  @eaton have you tried deepseek 2.5? I've found that to be quite capable, although I haven't tested your use case
  
  In conversation about 9 months ago permalink
- Embed this notice
  Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:31:57 JST Eaton
  in reply to
  - Tim Carmody
  @tim I’ll probably give it a shot — my suspicion is that it’ll take a lot more tweaking of the prompts and settling on one model to optimize for, as what I’m trying to do (get it to pick one of ten specific answers from a fixed list) is definitely working against the grain
  
  In conversation about 9 months ago permalink
- Embed this notice
  Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:35:07 JST Eaton
  in reply to
  
  FWIW llama3.1 and gemma2 have been doing much better — not getting all the right answers, but sticking to the instructions well.
  
  In conversation about 9 months ago permalink
- Embed this notice
  Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:36:12 JST Eaton
  in reply to
  - Karen McGrane
  Not sure if @karenmcgrane has mentioned what we’re up to, but it’s basically demoing a range of techniques — from naive NLP classifiers to vector proximity scoring to “prompt the llm and see what it says” for classification work, and talking through the kind of work that’s necessary to make the techniques “work well” for folks doing IA work
  
  In conversation about 9 months ago permalink
- Embed this notice
  Eaton (eaton@phire.place)'s status on Monday, 16-Sep-2024 08:37:29 JST Eaton
  in reply to
  
  As a result what we’re doing probably looks more like crude fine tuning than “prompting” lol
  
  In conversation about 9 months ago permalink

Public

Notices

Feeds