17 votes

Detecting hallucinations in large language models using semantic entropy

4 comments

  1. skybrian
    Link
    Here's the abstract: They are generating an answer multiple times and automatically checking to see if they are consistent (using more LLM calls), to see if it's answer that depends on the random...

    Here's the abstract:

    Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents or untrue facts in news articles and even posing a risk to human life in medical domains such as radiology. Encouraging truthfulness through supervision or reinforcement has been only partially successful. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.

    They are generating an answer multiple times and automatically checking to see if they are consistent (using more LLM calls), to see if it's answer that depends on the random number generator.

    That sounds expensive, but if it works, it's progress, chipping away at the problem.

    Our method explicitly does not directly address situations in which LLMs are confidently wrong because they have been trained with objectives that systematically produce dangerous behaviour, cause systematic reasoning errors or are systematically misleading the user. We believe that these represent different underlying mechanisms—despite similar ‘symptoms’—and need to be handled separately.

    7 votes
  2. [3]
    cfabbro
    Link
    Related article on the paper, from Ars Technica: https://arstechnica.com/ai/2024/06/researchers-describe-how-to-tell-if-chatgpt-is-confabulating/ Tildes Topic on the article:...
    7 votes
    1. [2]
      skybrian
      Link Parent
      Oops! Thanks.

      Oops! Thanks.

      3 votes
      1. cfabbro
        Link Parent
        How dare you not notice and remember every topic ever posted to Tildes! ;) Only reason I remember it is because I'm the one that tagged it. :P

        How dare you not notice and remember every topic ever posted to Tildes! ;)
        Only reason I remember it is because I'm the one that tagged it. :P

        4 votes