Planning CSUIRx

Added October 16, 2023 10:41 AM (PDT)

This project has been rebranded and moved to SearchRights.org.

This post introduces a planned comparative evaluation of new web search systems with a focus on dimensions around integrating with users’ curiosity-engagement, question-generation, response validation, search repair, search sharing, complaint and feedback, and other concerns at the core of my research.

The new attention to search, sparked by widespread interest in OpenAI’s ChatGPT, has pushed many to develop new ways to search.

This evaluation is a look towards comparing different search systems, with a particular eye towards finding and shining some light on systems that are opening up search in new ways.

Draft 1.

Motivation:

Many people are now seeing that search is much more than ten blue links, much more than one company. We have a big chance to really change search for the better. Will we?

Source: Hire me.

I have a wide range of criteria by which to provide some marks and remarks on these systems. I’ll need to narrow them down and gradually work through them. I’m not thinking about them as goals, requirements, or desired-specifications, and some even may contradict. For some of the criteria I will provide citations as reference or support. Some criteria are drawn from examples in previous search systems (including shuttered, speculative, and experimental systems). My goal here is not to simply do an accounting of searching today, but to get some sense of where we might want search to go.

Sample

Here is a sample of a few questions as applied to several web search systems.

CSUIRx:Comparative Search User Integration Rx

as of September 07, 2023 03:07 PM (PDT)
	Andi Search	Metaphor	Perplexity AI	Phind	You.com
Example searches Purpose: Help users practice new ways of imagining and formulating queries. Research suggests that effective prompting of LLMs can be challenging (Zamfirescu-Pereira et al., 2023) and query formulation even in mainstream web search engines is complicated (Tripodi, 2018).
Are there default example searches?	Yes Ex. [history of the birkin bag] [What is the Mandalorian?] [write a review of the best michelin star chefs in the world] [Go to apps, eg ‘go aws console’]	Yes Ex. [companies working on fusion] [blog posts to learn how LLMs work] [best restaurants in San Francisco] [fun concerts in San Francisco]	Partial Only in the mobile app. Listed as ‘Popular’, ex. [examples of preventive medicine practices] [what are the most popular podcast categories] [breathtaking adventure books about Africa] [how does skiplagging work]	Yes Listed as ‘Explore’, ex. [Why we shouldn’t use useMemo() in React everywhere] [The 2005 Japanese PS2 videogame Ikusagami renders thousands of enemies on screen. How can I achieve this in Godot 4?] [Why are there so many potholes in SF and can I get money if my car is broken by one] [How do I conditionally import components in Astro?] [Compare and contrast H100 availability across Lambda Labs and AWS]	Yes Ex. [Check today’s weather] [Find cost-effective vacations] [Shop for running shoes]
Do default example searches have contextually relevant explanation on sourcing?	No	No	Partial Just the word “Popular”	No	No
Is there a “searchable repository of examples” (Zamfirescu-Pereira et al., 2023)? Zamfirescu-Pereira et al. (2023) point to a “prompt book” for DALL·E. Other examples, also in the image generation domain, include Lexica (marketing itself as “The Stable Diffusion search engine”).	No	Partial `# show-and-tell` channel in the Discord channel: “Show off what you’ve found with Metaphor!”	Partial (1) `# 💗 \| sharing` channel in the Discord channel: “Share cool results that you got with Perplexity! Please keep everything safe and friendly.” (2) A Discover page shares 30 example queries.	Partial `# 🤯impressive-results` channel in the Discord channel"	No? `# create-and-share` channel in the Discord channel appears focused on image generation rather than searching with You.com All or YouChat."
Standards & Openness
Can you search by URL?	No	Yes Example: [What is an LLM?]	Yes Example: [What is an LLM?]	Yes Example: [What is an LLM?]	Yes Example: [What is an LLM?]
Is there an API?	No	Yes Example: [What is an LLM?]	No Example: [What is an LLM?]	No	No You.com does not offer an API for search or chat at this time. However, we are considering creating one in the future. If you would like to be notified when this happens, fill out the form here: https://about.you.com/api/ - source
Sharing searching
Can you share a link to the results or conversation?	No	No	Yes	Yes Requires sign-in.	Yes Requires sign-in.

Initial Search Systems

In this initial set of reviews, I’m focusing on these search engines, listed alphabetically:

Andi Search (andisearch.com)
Metaphor (metaphor.systems)
Perplexity AI (perplexity.ai)
Phind (phind.com)
You.com (you.com)

These are my initial examples of new approaches to searching in generative web search systems. I may provide come contextualizing comments about other systems, like the explicit search-focused tools from Google and Microsoft, and chat-based systems like ChatGPT, Anthropic’s Claude, etc., that support search and search-like interactions.

To they extend that they support public-facing search, I will also be examining newer search libraries and services (including RAG frameworks), like the offerings from LangChain, from LlamaIndex, and Weaviate’s Verba, with comparison to (the also adapting) existing tools like those from Algolia and Elasticsearch.

Broad Criteria

My criteria are broad. I’m focused on concerns my research and training best prepares me to engage with. These are broadly questions related to the explicit and implicit articulation of the search system, the interactions around queries and results, the ability to share the burden of search, and the formalized methods of complaint. I’ll do some explicit evaluations of atomic performance related to “hallucination” or “groundedness”, but my focus is more on how people perceive and perform-with tool outputs than the outputs themselves. How are the searchers ushered into their searches? What do they see as searchable? How can they engage with search results (or responses)? Are they expected to vet the responses for hallucinations? How is automation bias addressed? What post-search activities are supported by the search system itself?

I’ll ask about features or uses that might perhaps be refused or reimagined, while situating this period of search amidst a longer history of search. There are importance concerns about misleading results, sources of training and reference data, oversight, and the future of work. I’m very much developing these reviews to acknowledge that these systems will keep changing. Where there are very important concerns that I am less well-versed in, like accessibility, I will leverage other resources.

@danielsgriffin via Twitter on Sep 3, 2023

What searches can we avoid doing? What newly / more easily think to do? What make newly possible? What can be slower or faster? Seamful? Viscid? Vetted? Doubtful & deliberate? Ephemeral or persistent? Memorable? Public? Shared? Surfing or blazing? Embedded? Loosely coupled? Fun?

Scoping

This is not intended to be an introductory guide to these systems, but focused on making sense of what new search tools are providing and what they might become. These reviews may be useful to heavy users, developers, and others looking to understand changes in system support for various searching practices.

I will largely be looking at systems for web search, including those more focused to particular subject areas. Though important, these reviews will not (yet at least) engage with new search systems for:

academic search:
- Examples:
  - Consensus (consensus.app)
  - Elicit (elicit.org)
- See:
  - Michael Gusenbauer’s call for independent audits (author copy)
  - Aaron Tay’s “categorization of interesting new academic discovery tools”
enterprise search
- Examples:
  - Glean (glean.com)
  - Vectara (vectara.com)
personal knowledge management
- Examples:
  - Klu (klu.so)
  - Rewind (rewind.ai)

I will also not be focused on in-editor code generation tools (like GitHub’s Copilot) and writing tools (like Lex.page) that replace or subsume some searching tasks.

I will not very focused on various metrics related to speed, unless it is very noticeable in frequent use.

I am concerned about questions of bias, but here only insofar as these systems are markedly different from the prior problems found in search.

I am not focused on explainability or transparency of these systems, though some question will definitely engage with those questions. I will be more focused on examining questions around seamfulness, tractability, and traceability. I will be thinking about how practical algorithmic knowledge (Cotter, 2022) is built up and valued.

I’m less focused on responding to or rehashing and regurgitating arguments about “model collapse”, than perhaps looking at how these search tools and their users imagine supporting or working towards unsealing knowledge, whether through articulations that help users doubt & dig deeper, providing multiple drafts, or RAG adaptations.

The most important work would be work looking at how these search systems and tools are imagined and used (or not) by other people. I am not looking at that right now, but I will look at aspects of the systems identified publicly by different users or others.

Acknowledgements

I’ve long wondered been inspired by the work at Ranking Digital Rights, and wondered what a related approach might help us think about in relation to our curiosity, questions, ignorance, doubts, and claims-making. (Of course digital rights are heavily implicated in the design and operations of search engines and in search itself.) Much of my thoughts around this were developed in conversations with Emma Lurie while we worked on our paper on Google’s Search Liaison (2022). I’ve thought also of her comparison of platform research API requirements (Lurie, 2023) while thinking through this. It was also a help to write-up and share The Need for ChainForge-like Tools in Evaluating Generative Web Search Platforms. I’ve also been inspired by recently seeing Search Smart, which looks at the academic search domain with largely different criteria of evaluation. And thanks Bill Chambers for a recent nudge.

References

Cotter, K. (2022). Practical knowledge of algorithms: The case of BreadTube. New Media & Society, 1–20. https://doi.org/10.1177/14614448221081802 [cotter2022practical]

Griffin, D., & Lurie, E. (2022). Search quality complaints and imaginary repair: Control in articulations of Google Search. New Media & Society, 0(0), 14614448221136505. https://doi.org/10.1177/14614448221136505 [griffin2022search]

Lurie, E. (2023). Comparing platform research api requirements. Tech Policy Press. https://techpolicy.press/comparing-platform-research-api-requirements/ [lurie2023comparing]

Tripodi, F. (2018). Searching for alternative facts: Analyzing scriptural inference in conservative news practices. Data & Society. https://datasociety.net/output/searching-for-alternative-facts/ [tripodi2018searching]

Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why johnny can’t prompt: How non-ai experts try (and fail) to design llm prompts. Proceedings of the 2023 Chi Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544548.3581388 [zamfirescu-pereira2023johnny]