Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Figure 2. Forest plot (based on Table 1) displaying odds ratios (OR) and their 95% confidence intervals for comparisons between LLM-generated summaries, original texts, and human-written summaries (NEJM JW). The plot shows the likelihood of generalized (vs. restricted) conclusions in LLM summaries compared to the corresponding reference texts. Higher ORs reflect stronger overgeneralization tendency. The vertical line at OR = 1 represents no difference from the reference text, indicating the benchmark for fully faithful LLM summaries. Comparisons where error bars overlap this line are not statistically significant.

Download link

Figure 2. Forest plot (based on Table 1) displaying odds ratios (OR) and their 95% confidence intervals for comparisons between LLM-generated summaries, original texts, and human-written summaries (NEJM JW). The plot shows the likelihood of generalized (vs. restricted) conclusions in LLM summaries compared to the corresponding reference texts. Higher ORs reflect stronger overgeneralization tendency. The vertical line at OR = 1 represents no difference from the reference text, indicating the benchmark for fully faithful LLM summaries. Comparisons where error bars overlap this line are not statistically significant.
https://nerdculture.de/system/media_attachments/files/114/354/538/763/194/441/original/2761776820dc6283.png

Notices where this attachment appears

Embed this notice
Nick Byrd (byrdnick@nerdculture.de)'s status on Friday, 18-Apr-2025 09:54:52 JST Nick Byrd

Most #LLMs over-generalized scientific results beyond the original articles
...even when explicitly prompted for accuracy!
The #AI was 5x worse than humans, on average!
Newer models were the worst.🤦♂️
🔓 Accepted in #RoyalSociety Open #Science: https://doi.org/10.48550/arXiv.2504.00025

In conversation about a month ago from nerdculture.de permalink