One of the questions in Carlini’s ‘A GPT-4 Capability Forecasting Challenge’ highlights the difficulty of ‘grounding’ (and it is an opportunity to encourage us to think about how addressing such concerns may require interface adaptations—including perhaps towards doubting more (Lindemann, 2023)).
While GPT-4 seemingly gets the question wrong even by its own logic, the question post by Carlini is—surprisingly(?)–not well enough specified. You can search the underlying fact on various platforms—in ROT13 I searched: [Xrvgu Heona “frpbaq nyohz”]—and find multiple answers for the same question. Searching Google and glancing at the snippets in the different results shows them all—including reputable specialist sites—waffling between a couple answers without a hint of hesitation or humility.
Google’s SGE (which I had to click for the generation) agrees with Carlini when the question is keyword focused, but provides an answer matching a direct claim from Wikipedia with a more natural language formulation.
Google’s featured snippet, which sometimes appears, shows a list that would suggest the Wikipedia answer but highlights Carlini’s answer.
Google’s Bard, Perplexity AI, Phind, and You.com all provide the Wikipedia answer. Metaphor autoprompts the natural language query into a prompt that includes the Wikipedia answer.
Andi, prompted in natural language (as it says “The more detailed your question, the better I can help 😄”), refers to Wolfram Alpha and provides a third answer. This matches what a refined prompt for GPT-4 reveals as well.
If you have any interest in checking out the forecasting game, you can avoid the spoiler below.
Question: What is the second song on tom cruise’s second wife’s second husband’s second album?
Answer: Who Wouldn’t Wanna Be Me
Resolution Criteria: The logic does not have to say Tom Cruise -> Nicole Kidman -> Keith Urban -> Golden Road -> Who Wouldn’t Wanna Be Me. But it does have to get the name right.
Searching [Keith Urban “second album”] and [What was Keith Urban’s second album?] provides The Ranch, Keith Urban II, and Golden Road as answers.
Wikipedia: Golden Road (album): “Golden Road is the third studio album by Australian country music singer Keith Urban.”
Wikipedia: Keith Urban (1999 album): “Keith Urban (also known as Keith Urban II) is the second studio album by Australian country music artist Keith Urban.”
Wikipedia: The Ranch (album): “The Ranch is the only album by the country music group The Ranch, fronted by Keith Urban. It was released by Capitol Nashville in 1997, and re-released in 2004 with two bonus tracks and re-titled Keith Urban in The Ranch.”
Wikipedia: Keith Urban (1991 album): “Keith Urban is the debut studio album by New Zealand-born country music singer Keith Urban. It was released only in Australia in 1991.”
Haider, J., & Sundin, O. (2019). Invisible search and online search engines: The ubiquity of search in everyday life. Routledge. https://doi.org/10.4324/9780429448546 [haider2019invisible]
Lindemann, N. F. (2023, August). Sealed knowledges: A critical approach to the usage of llms as search engines. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. https://doi.org/10.1145/3600211.3604737 [lindemann2023sealed_paper]
Liu, N. F., Zhang, T., & Liang, P. (2023). Evaluating verifiability in generative search engines. https://doi.org/10.48550/arXiv.2304.09848 [liu2023evaluating]
Lurie, E., & Mulligan, D. K. (2021). Searching for representation: A sociotechnical audit of googling for members of U.S. Congress. https://arxiv.org/abs/2109.07012 [lurie2021searching_facctrec]
Mulligan, D. K., & Griffin, D. (2018). Rescripting search to respect the right to truth. The Georgetown Law Technology Review, 2(2), 557–584. https://georgetownlawtechreview.org/rescripting-search-to-respect-the-right-to-truth/GLTR-07-2018/ [mulligan2018rescripting]