Notices by Nick Byrd (byrdnick@nerdculture.de)

Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Friday, 18-Apr-2025 09:54:52 JST Nick Byrd

Most #LLMs over-generalized scientific results beyond the original articles
...even when explicitly prompted for accuracy!
The #AI was 5x worse than humans, on average!
Newer models were the worst.🤦♂️
🔓 Accepted in #RoyalSociety Open #Science: https://doi.org/10.48550/arXiv.2504.00025
In conversation about 2 months ago from nerdculture.de permalink
Attachments
1. Figure 2. Forest plot (based on Table 1) displaying odds ratios (OR) and their 95% confidence intervals for comparisons between LLM-generated summaries, original texts, and human-written summaries (NEJM JW). The plot shows the likelihood of generalized (vs. restricted) conclusions in LLM summaries compared to the corresponding reference texts. Higher ORs reflect stronger overgeneralization tendency. The vertical line at OR = 1 represents no difference from the reference text, indicating the benchmark for fully faithful LLM summaries. Comparisons where error bars overlap this line are not statistically significant.
  https://nerdculture.de/system/media_attachments/files/114/354/538/763/194/441/original/2761776820dc6283.png
2. Figure 3. Comparisons between the raw proportions of scientific articles and human-authored as well as LLM-generated article summaries that contain generalized conclusions, overall algorithmic overgeneralizations, and specific algorithmic overgeneralizations, presented by text source and test condition. Error bars represent standard errors.
  https://nerdculture.de/system/media_attachments/files/114/354/538/766/411/596/original/fe1aa31155bac050.png
3. Domain not in remote thumbnail source whitelist: arxiv.org
  
  Generalization Bias in Large Language Model Summarization of Scientific Research
  
  Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to their original scientific texts. Even when explicitly prompted for accuracy, most LLMs produced broader generalizations of scientific results than those in the original texts, with DeepSeek, ChatGPT-4o, and LLaMA 3.3 70B overgeneralizing in 26 to 73% of cases. In a direct comparison of LLM-generated and human-authored science summaries, LLM summaries were nearly five times more likely to contain broad generalizations (OR = 4.85, 95% CI [3.06, 7.70]). Notably, newer models tended to perform worse in generalization accuracy than earlier ones. Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings. We highlight potential mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy.
Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Sunday, 30-Mar-2025 19:17:33 JST Nick Byrd

Overheard at a conference about #AI in #Medicine:
Speaker: "I hear neurologists prefer we say that generative AI systems 'confabulate' and not that they 'hallucinate'."
Neurologist [shouting from the back of the room]: "CORRECT!"
#psychiatry #neuroscience #sciComm #edu
In conversation about 3 months ago from nerdculture.de permalink
Attachments
1. Chase Parsons, DO, MBI Chief Medical Information Officer Boston Children's Hospital
  https://nerdculture.de/system/media_attachments/files/114/235/056/717/858/390/original/854e0b7962e7abf9.jpeg
Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Wednesday, 05-Feb-2025 04:48:31 JST Nick Byrd
- Nick Byrd
Alright nerds,
What are the *easiest* methods to #repost or #crosspost my #Mastodon posts to #BlueSky (or vice versa)?
In other words, how can I make my BlueSky account (@byrdnick.com) post whatever I post to this Mastodon account (@ByrdNick)? (Or vice versa?)
#socialMedia #webhosting #API

In conversation about 5 months ago from nerdculture.de permalink
Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Wednesday, 05-Feb-2025 04:48:27 JST Nick Byrd
in reply to

BlueSky Crossposter™©® worked (after plenty of troubleshooting and some recoding): https://nerdculture.de/@ByrdNick/113454337286905203
In conversation about 5 months ago from nerdculture.de permalink
Attachments
1. No result found on File_thumbnail lookup.
  
  Nick Byrd, Ph.D. (@ByrdNick@nerdculture.de)
  
  from Nick Byrd, Ph.D.
  
  I'm trying out "Bluesky Crossposter™©® developed by denvitadrogen": https://github.com/Linus2punkt0/bluesky-crossposter/tree/main If you're seeing this post show up somewhere other then @bsky.app, then I got it working. Otherwise 😒
Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Wednesday, 05-Feb-2025 04:48:23 JST Nick Byrd
in reply to

After #BlueSky Crossposter 👆 stopped working for me, I found #Fedica, which has been crossposting to nearly 10 platforms (for free!):
https://nerdculture.de/@ByrdNick/113483550862956198

In conversation about 5 months ago from nerdculture.de permalink
Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Thursday, 14-Sep-2023 23:27:03 JST Nick Byrd

Remember that "...WEIRDest people in the world" paper?
Now #xPhi has one: Of "171 experimental philosophy studies [from] 2017 [to] 2023 [including one of mine] most ...tested only Western populations but generalized beyond them without justification."
Incentives may be part of the issue: "studies with broader conclusions ...had higher citation impact."
https://doi.org/10.1017/psa.2023.109
#xPhi #PsychMethods #Culture #Demography #PhilSci
In conversation Thursday, 14-Sep-2023 23:27:03 JST from nerdculture.de permalink
Attachments
1. Untitled attachment

Public

Notices by Nick Byrd (byrdnick@nerdculture.de)

User actions

Following 0

Followers 0

Groups 0

Statistics

Feeds