[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).]

    Added September 28, 2023 11:18 PM (PDT)

    It appears that my attempts to stop the search systems from adopting these hallucinated claims have failed. I share on Twitter screenshots of various search systems, newly queried with my Claude Shannon hallucination test, highlighting an LLM response, returning multiple LLM response pages in the results, or citing to my own page as evidence for such a paper. I ran those tests after briefly testing the newly released Cohere RAG.

    Added October 01, 2023 12:57 AM (PDT)

    I noticed today that Google's Search Console–in the URL Inspection tool–flagged a missing field in my schema:
    Missing field "itemReviewed"
    This is a non-critical issue. Items with these issues are valid, but could be presented with more features or be optimized for more relevant queries
    In the hopes of finding out how to better discuss problematic outputs from LLMs, I went back to Google's Fact Check Markup Tool and added the four URLs that I have for the generated false claims. I then updated the schema in this page (see the source, for ease of use, see also this gist that shows the two variants.)

    Added October 06, 2023 10:59 AM (PDT)

    An Oct 5 article from Will Knight in Wired discusses my Claude Shannon "hallucination" test: Chatbot Hallucinations Are Poisoning Web Search

    A round-up here: Can you write about examples of LLM hallucination without poisoning the web?

    The comment below prompted me to do a single-query prompt test for "hallucination" across various tools. Results varied. Google's Bard and base models of OpenAI's ChatGPT and others failed to spot the imaginary reference. You.com, Perplexity AI, Phind, and ChatGPT-4 were more successful.

    I continue to be impressed by Phind's performance outside of coding questions (their headline is "The AI search engine for developers").

    @anthonymoser via Bluesky on Jul 4, 2023

    I'm imagining an instructor somewhere making a syllabus with chat gpt, assigning reading from books that don't exist

    But the students don't notice, because they are asking chat gpt to summarize the book or write the essay

  • I generally think addressing hallucination of this second sort (summarizing fake papers) is low-hanging fruit. The remedies seem straight forward (though not free) and the incentives appear to be well-aligned.
  • But I was surprised at how poorly ChatGPT performed on a simplistic mock-attempt at the student prompt here. Running on other tools was also pretty disappointing.
  • Granted, models may perform worse if the title itself were hallucinated. It is likely the author-and-title tested below title is somewhat in their hallucinatory-space, whereas other titles may not be. For instance, ChatGPT correctly noted that neither Stephen Hawking nor Plato had a piece by that title
  • See also

    Added October 01, 2023 12:57 AM (PDT):

    ChatGPT [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A ChatGPT.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:06:36
    I conducted a follow-on test today and ChatGPT 3.5 still failed:
    "A Short History of Searching" is an influential paper written by Claude E. Shannon in 1948. In this paper, Shannon provides a historical overview of searching techniques and the development of information retrieval systems.

    Note: Andi does not hallucinate the contents of such a paper.
    Andi [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Andi[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:32:24

    Bard [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Bard[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:16:40

    Note: Perplexity AI takes the paper title at face value and hallucinates only briefly the contents before expanding on other work. (In a follow-on test (after querying Perplexity AI's Copilot), to account for my misordered test of You.com & You.com's GPT-4 version, does better at indicating the reference may be imaginary: Claude E. Shannon's "A Short History of Searching" is not mentioned in the search results....)
    Perplexity AI [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Perplexity AI[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:15:29

    Inflection AI Pi [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Inflection AI Pi[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:35:49 [screenshot manually trimmed to remove excess blankspace]

    Yes, even the namesake model struggles here.

    via Quora's Poe

    Claude Instant [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Claude Instant[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:35:16 [screenshot manually trimmed to remove excess blankspace]

    ✅ Note: I messed up this test. The timestamp for the base model search on You.com is _after_ my search on the GPT-4 model. It is possible that their base model draws on a database of previous responses from the better model.
    You.com [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A You.com[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-05 11:22:19


    Note: While I believe GPT-4 was selected when I submitted the query, I am not sure (given it can be toggled mid-conversation?).
    You.com.GPT-4 [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A You.com.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:14:49


    Note: This is omitting the Copilot interaction where I was told-and-asked "It seems there might be a confusion with the title of the paper. Can you please confirm the correct title of the paper by Claude E. Shannon you are looking for?" I responded with the imaginary title again.
    Perplexity AI.Copilot [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Perplexity AI.Copilot[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:39:13

    Phind [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Phind[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:37:20

    ChatGPT.GPT-4 [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A ChatGPT.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-05 11:16:03