conventional search engines are limited to retrieving pre-existing webpages?

    May 19th, 2023

    This is a short post about Liu et al.’s “Evaluating Verifiability in Generative Search Engines” (2023). https://doi.org/10.48550/arXiv.2304.09848 [liu2023evaluating].

    Is it true that “conventional search engines are limited to retrieving pre-existing webpages”1?

    While this is a minor comment in a much larger paper, we can benefit from discussing conventional search engines—these massive sociotechnical systems—as extending far beyond a single moment in time and, explicitly, as incorporating the actions of others. (This is particularly so as we describe these systems alongside appraisals of alternative approaches to web searching.) The substance of responses to information needs by conventional search engines, like Google, are constantly in flux, both internally, and in being reshaped by external participants motivated to be found (and so responding to various incentives).

    Saying that “conventional search engines are limited to retrieving pre-existing webpages” dismisses the key ways that search engines shape the existence of those web pages (and new ones) in the first place. Since Introna and Nissenbaum’s early critique (2000), we’ve seen that the various indexing and ranking choices (including moderation), page design choices, and articulations from the major search engines, along with the development of their business models, shape the searching practices of searches, the behavior of advertisers, the tactics and strategies of search engine optimizers, and the individual and market-driven choices underlying the production and availability of content and experiences on the web. Ignoring the temporal arrangements in the web search ecosystem may lead us to both (1) ignore the vast role that the dominant conventional search engine plays in the webpages that exist at any particular point in time, the shape of the web, and (2) develop a ‘blinkered’ perspective on the choices and actions of other participants ready (or not) to perform for (or against) these new approaches to search.

    There are also direct elements provided on the search engine results pages (SERPs) that are not limited to the retrieval of “pre-existing webpages”. Conventional search engines have long presented features on their SERPs that are extracted and aggregated from those pre-existing webpages or databases not available on the web (in the Knowledge Panels, for instance, and other “rich features” like image galleries). Google’s “Featured snippets” also selectively extract from existing websites (and clearly motivated the creation of content aimed to be found as such) have themselves featured information that has been identified numerous times as being misleading.2

    Google has additionally, since as early as 20123, dynamically edited/generated (Google now says: “automatically determine[d]”) both the “Text link” (title of a search result) and the “snippet” (or page summary)4. This practice has not always been appreciated by the SEO community (see multiple responses from Google5, leading to the feedback (on the generated titles) including comments saying in some situations it “seems to be misleading for the user”, and calling it:

    As well as issuing complaints like: “what they are changing is information supplied by others, that represents others, that may be damaging the income of others, and/or potentially putting others at risk (regulatory requirements etc.).

    Another feature of conventional SERPs are suggested searches, whether in the autocomplete dropdown while typing a query, in suggesting a revision for an alternate spelling, to reformulate the query to put quotation marks around a particular word, or in various elements interspersed on the search page. Google currently provides a “People also ask” or “Others want to know” section and a “Related searches” section.

    In additional to generating rich features and the text descriptions of the web pages, conventional search engines respond to queries by providing a ranking (itself treated as information) of selected search results that shape the perceptions of the results by the different searchers and dynamically insert sponsored links (with text links and snippets and only a small text label it indicate it is paid for). All of this is generated on a SERP with particular fonts, colors, sizing, and placement. These various aspects of the SERP are together factors in whether the searchers are satisfied, left wanting, or are effectively misled.6


    Footnotes

    1. Liu et al. (2023)’s opening line: “Generative search engines fulfill user information needs by directly generating responses to input queries, along with in-line citations.”, footnotes to: “In contrast, conventional search engines are limited to retrieving pre-existing webpages.” (This is all I’ve read of the paper so far, beyond the abstract.)↩︎

    2. See re seizures (Griffin & Lurie, 2022, Raji et al., 2022), re voting information (Lurie & Mulligan, 2021), etc.↩︎

    3. Far (2012): “we have algorithms that generate alternative titles to make it easier for our users to recognize relevant pages”↩︎

    4. “[A]utomatically determine” language is found in Google (n.d.-b) and Google (n.d.-a). “Text link” and “snippet” are preferred terms from Google (n.d.-c).↩︎

    5. (Sullivan, 2021a, 2021b)↩︎

    6. A key question, though not the only one, when considering seemingly false or unfounded information on a SERP is whether it is “likely to mislead”. Such information “may not be misleading searchers of the results of search as some inaccurate results likely trigger further information seeking rather than belief in an inaccurate answer” (Lurie & Mulligan, 2021). That said, time costs are still relevant.↩︎

    References

    Far, P. (2012). Better page titles in search results. Google Search Central; Blog post. https://developers.google.com/search/blog/2012/01/better-page-titles-in-search-results [far2012better]

    Google. (n.d.-a). Control your snippets in search results. Google Search Central; Webpage. https://developers.google.com/search/docs/appearance/snippet [google_snippet]

    Google. (n.d.-c). Visual elements gallery of google search. Google Search Central; Webpage. https://developers.google.com/search/docs/appearance/visual-elements-gallery [google_visual]

    Griffin, D., & Lurie, E. (2022). Search quality complaints and imaginary repair: Control in articulations of Google Search. New Media & Society, 0(0), 14614448221136505. https://doi.org/10.1177/14614448221136505 [griffin2022search]

    Introna, L. D., & Nissenbaum, H. (2000). Shaping the web: Why the politics of search engines matters. The Information Society, 16(3), 169–185. https://doi.org/10.1080/01972240050133634 [introna2000shaping]

    Liu, N. F., Zhang, T., & Liang, P. (2023). Evaluating verifiability in generative search engines. https://doi.org/10.48550/arXiv.2304.09848 [liu2023evaluating]

    Lurie, E., & Mulligan, D. K. (2021). Searching for representation: A sociotechnical audit of googling for members of U.S. Congress. https://arxiv.org/abs/2109.07012 [lurie2021searching_facctrec]

    Raji, I. D., Kumar, I. E., Horowitz, A., & Selbst, A. (2022, June). The fallacy of AI functionality. 2022 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3531146.3533158 [raji2022fallacy]

    Sullivan, D. (2021a). An update to how we generate web page titles. Google Search Central; Blog post. https://developers.google.com/search/blog/2021/08/update-to-generating-page-titles [sullivan2021update]

    Sullivan, D. (2021b). More information on how google generates titles for web page results. Google Search Central; Blog post. https://developers.google.com/search/blog/2021/09/more-info-about-titles [sullivan2021more]