Conversation

Notices

Embed this notice
Paul Cantrell (inthehands@hachyderm.io)'s status on Tuesday, 10-Feb-2026 01:30:34 JST Paul Cantrell

LLMs have no model of correctness, only typicality. So:
“How much does it matter if it’s wrong?”
It’s astonishing how frequently both providers and users of LLM-based services fail to ask this basic question — which I think has a fairly obvious answer in this case, one that the research bears out.
(Repliers, NB: Research that confirms the seemingly obvious is useful and important, and “I already knew that” is not information that anyone is interested in except you.)
1/ https://www.404media.co/chatbots-health-medical-advice-study/
In conversation about a month ago from hachyderm.io permalink
Attachments
1. Domain not in remote thumbnail source whitelist: images.unsplash.com
  
  Chatbots Make Terrible Doctors, New Study Finds
  
  from @samleecole
  
  Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Tuesday, 10-Feb-2026 01:32:29 JST Paul Cantrell
  in reply to
  
  Despite the obviousness of the larger conclusion (“LLMs don’t give accurate medical advice”), this passage is…if not surprising, exactly, at least really really interesting.
  2/
  In conversation about a month ago permalink
  Attachments
  1. When the researchers tested the LLMs without involving users by providing the models with the full text of each clinical scenario, the models correctly identified conditions in 94.9 percent of cases. But when talking to the participants about those same conditions, the LLMs identified relevant conditions in fewer than 34.5 percent of cases. People didn’t know what information the chatbots needed, and in some scenarios, the chatbots provided multiple diagnoses and courses of action. Knowing what questions to ask a patient and what information might be withheld or missing during an examination are nuanced skills that make great human physicians; based on this study, chatbots can’t reliably replicate that kind of care.
    https://media.hachyderm.io/media_attachments/files/116/041/626/055/665/046/original/fd66607f26441bce.png
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Tuesday, 10-Feb-2026 01:35:36 JST Paul Cantrell
  in reply to
  
  There’s a lesson here, perhaps, about the tangled relationship between what is •typical• and what is •correct•, and what it is that LLMs actually do:
  When medical professionals ask medical questions in technical medical language, the answers they get are typically correct.
  When non-professional ask medical questions in a perhaps medically ill-formed vernacular mode, the answers they get are typically wrong.
  The LLM readily models both of these things. Despite having no notion of correctness in either case, correctness is more statistically typical in one than the other.
  3/
  
  In conversation about a month ago permalink
- Embed this notice
  Brian Marick (marick@mstdn.social)'s status on Tuesday, 10-Feb-2026 06:39:23 JST Brian Marick
  in reply to
  
  @inthehands An aside. When people used to ask Dawn wasn’t it hard to treat animals because “they can’t tell you what’s wrong,” she’d answer that they also can’t lie about it. She thought the latter probably outweighed the former.
  
  In conversation about a month ago permalink
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Tuesday, 10-Feb-2026 06:39:39 JST Paul Cantrell
  in reply to
  
  RE: https://girlcock.club/@miss_rodent/116041738842160668
  This is another, crisper way of saying what I meant by the previous post: if it sounds like a medical textbook, you’re more likely get a diagnosis; if it sounds like a tweet, you’re more likely to get a diagnosis.
  The tone, vocabulary, and style of the question change the likelihood that the answer is correct.
  4/
  In conversation about a month ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    V (@miss_rodent@girlcock.club)
    
    from V
    
    @inthehands@hachyderm.io This result makes sense - they generate *statistically likely* text based on a prompt, and the stolen words of basically the entire internet and several libraries worth of books. If the prompt is such that the text it generates is statistically-likely to be correct - the language used closely aligns with a medical textbook, diagnostic manual, etc. - it's more likely to generate text based on sources like that. If it sounds like a tweet, you're more likely to get a shitpost.
- Embed this notice
  Paul Cantrell (inthehands@hachyderm.io)'s status on Tuesday, 10-Feb-2026 06:39:53 JST Paul Cantrell
  in reply to
  - Brian Marick
  @marick
  That’s profound.
  (Though also: I know that guinea pigs can be notoriously difficult to diagnose because, as prey animals, they’re very good at hiding that they have a problem!)
  
  In conversation about a month ago permalink

Public

Conversation

Notices

Feeds