GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 05:39:33 JST Tim Chambers Tim Chambers

    Question for the Fediverse hive mind. Is there any evidence that the #AI models hallucination rates are getting ANY better over time?

    I'm wondering if a 5 to 15% hallucination rate may just be the nature of the best with LLM's and an unsolvable problem.

    In conversation about a year ago from indieweb.social permalink
    • Embed this notice
      Jörn Franke (jornfranke@mastodon.online)'s status on Thursday, 20-Jun-2024 06:14:24 JST Jörn Franke Jörn Franke
      in reply to

      @tchambers there is no specific hallucination rate of a LLM. The errors of LLM change depending on the context and what you ask. It is impossible to measure a specific rate - it is a nonsense measure as it always depends on the context.

      In conversation about a year ago permalink
    • Embed this notice
      Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 06:14:24 JST Tim Chambers Tim Chambers
      in reply to
      • Jörn Franke

      @jornfranke I saw this effort that at least tried to create a means to do so: https://huggingface.co/blog/leaderboard-hallucinations

      In conversation about a year ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: huggingface.co
        The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models
        We’re on a journey to advance and democratize artificial intelligence through open source and open science.
    • Embed this notice
      Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 06:50:35 JST Tim Chambers Tim Chambers
      in reply to
      • Jörn Franke

      @jornfranke Maybe that technique with the BIG assumption they’re open about the training data - would be the best way to test this?

      In conversation about a year ago permalink
    • Embed this notice
      Jörn Franke (jornfranke@mastodon.online)'s status on Thursday, 20-Jun-2024 06:50:36 JST Jörn Franke Jörn Franke
      in reply to

      @tchambers well this is based on datasets the LLMs are actually trained on.... It is unrealistic that users will ask the same questions. Then, depending on the context with prompt that can be completely different. Also the safety guards put by some LLMs may obfuscate the results for "publicly known datasets" in either direction. The leatherboard has the same issues as LLM: Nobody can verify that the leatherboard delivers meaningul results.

      In conversation about a year ago permalink
    • Embed this notice
      Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 19:27:30 JST Tim Chambers Tim Chambers
      in reply to
      • Erlend Sogge Heggen

      @erlend great find, thanks!

      In conversation about a year ago permalink
    • Embed this notice
      Erlend Sogge Heggen (erlend@writing.exchange)'s status on Thursday, 20-Jun-2024 19:27:31 JST Erlend Sogge Heggen Erlend Sogge Heggen
      in reply to

      @tchambers there’s much more evidence to the contrary:

      https://garymarcus.substack.com/p/evidence-that-llms-are-reaching-a

      https://garymarcus.substack.com/p/facing-facts

      In conversation about a year ago permalink
    • Embed this notice
      Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 19:58:53 JST Tim Chambers Tim Chambers
      in reply to

      Looking through this too: https://arxiv.org/abs/2401.01313

      #AI

      In conversation about a year ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: arxiv.org
        A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
        As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deploying these powerful LLMs into real-world production systems that impact people's lives. The journey toward widespread adoption of LLMs in practical settings heavily relies on addressing and mitigating hallucinations. Unlike traditional AI systems focused on limited tasks, LLMs have been exposed to vast amounts of online text data during training. While this allows them to display impressive language fluency, it also means they are capable of extrapolating information from the biases in training data, misinterpreting ambiguous prompts, or modifying the information to align superficially with the input. This becomes hugely alarming when we rely on language generation capabilities for sensitive applications, such as summarizing medical records, financial analysis reports, etc. This paper presents a comprehensive survey of over 32 techniques developed to mitigate hallucination in LLMs. Notable among these are Retrieval Augmented Generation (Lewis et al, 2021), Knowledge Retrieval (Varshney et al,2023), CoNLI (Lei et al, 2023), and CoVe (Dhuliawala et al, 2023). Furthermore, we introduce a detailed taxonomy categorizing these methods based on various parameters, such as dataset utilization, common tasks, feedback mechanisms, and retriever types. This classification helps distinguish the diverse approaches specifically designed to tackle hallucination issues in LLMs. Additionally, we analyze the challenges and limitations inherent in these techniques, providing a solid foundation for future research in addressing hallucinations and related phenomena within the realm of LLMs.
    • Embed this notice
      Tim Chambers (tchambers@indieweb.social)'s status on Thursday, 20-Jun-2024 20:01:05 JST Tim Chambers Tim Chambers
      in reply to

      And looking over this: https://arxiv.org/html/2406.04175v1 #AI

      In conversation about a year ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Confabulation: The Surprising Value of Large Language Model Hallucinations

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.