GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Figure 2. Forest plot (based on Table 1) displaying odds ratios (OR) and their 95% confidence intervals for comparisons between LLM-generated summaries, original texts, and human-written summaries (NEJM JW). The plot shows the likelihood of generalized (vs. restricted) conclusions in LLM summaries compared to the corresponding reference texts. Higher ORs reflect stronger overgeneralization tendency. The vertical line at OR = 1 represents no difference from the reference text, indicating the benchmark for fully faithful LLM summaries. Comparisons where error bars overlap this line are not statistically significant.

Download link

https://nerdculture.de/system/media_attachments/files/114/354/538/763/194/441/original/2761776820dc6283.png

Notices where this attachment appears

  1. Embed this notice
    Nick Byrd (byrdnick@nerdculture.de)'s status on Friday, 18-Apr-2025 09:54:52 JST Nick Byrd Nick Byrd

    Most #LLMs over-generalized scientific results beyond the original articles

    ...even when explicitly prompted for accuracy!

    The #AI was 5x worse than humans, on average!

    Newer models were the worst.🤦♂️

    🔓 Accepted in #RoyalSociety Open #Science: https://doi.org/10.48550/arXiv.2504.00025

    In conversation about a month ago from nerdculture.de permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.