Are prompts—& queries—not Lipschitz?

@zacharylipton via Twitter on Aug 3, 2023

Prompts are not Lipschitz. There are no “small” changes to prompts. Seemingly minor tweaks can yield shocking jolts in model behavior. Any change in a prompt-based method requires a complete rerun of evaluation, both automatic and human. For now, this is the way.

This is sometimes, but not always, true of search queries (on various search tools, including Google) as well. It depends heavily on the types of changes and the corpus of available relevant material. Perhaps this here—_Lipschitz_—is a somewhat useful phrasing? How have people described this in the search evaluation or [search audits](/docs/2023/07/05/search-audits.html) literature? @tripodi2018searching [p. 4]:

I document the way in which simple syntax differences can create and reinforce ideological biases in newsgathering.

and (p. 33):

Applying scriptural inference (or not doing so) creates **a dramatic difference in search results from otherwise similar queries**, creating an opportunity for partial, partisan, narratives to persist.

But elsewhere [@hora2021googling, p. 10]:

We found that developers’ queries typically include references to key contexts, are short, and tend to omit functional words; **minor changes to queries do not largely affect the search results**; and top search results are dominated by Stack Overflow and YouTube.

[emphases added]

See also the slight variations in search terms audited by @lurie2021searching_facctrec. They used "a survey to identify realistic query formulations" (p. 3):

We asked participants to search for the name of the congressional representative, record the name of their representative, and then record all of the queries they searched to find the name of their representative.

And found variation:

The breadth of queries generated by users illustrates the lack of a common search strategy among participants. While some users searched by state name, others searched with some combination of county, place (e.g.town, city, community), (5-digit) zip code, congressional district, state name, and state abbreviation.

These questions connect also to "user input bias" [@trielli2018defining, p. 1]:

any activity from the user that has an impact in how the algorithm retrieves information.

They write further (*ibid.*):

investigations into search engine biases must take into account the construction of the query in a search. But a deeper understanding is needed of why users choose the search terms they use and how impactful those decisions are. For instance, what makes a user search for "gun rights" versus "gun control"? And what is the impact of those decisions?

You can also see this in the difference between the results for Google [ latina teens ] and Google [ google latina teens ] (and [**still today!**](http://web.archive.org/screenshot/https://www.google.com/search?q=google+latina+teen) —note: explicit language).

Various audits from [my students](/courses/s2023-LB322B.html) looking at slight changes, including simply choosing to pluralize animal names or not.