tooling to support people in making hands-on and open evaluations of search

    Tomorrow I’m attending the Union Square Ventures AI Roundtable #1: AI and Search. I’m looking forward to a dynamic conversation. I am also using it as a forcing function to write down something about what I’m really narrowing in on: developing tooling for user exploration and evaluation of search systems to support a strong new search ecosystem.

    Building on my prior research I am very focused on developing shared tooling and other resources to support people in making hands-on and open evaluations of search systems and responses (particularly to public interest search topics). We need this sort of tooling to better inform individual and shared search choices, including for refusing, resisting, repairing, and reimagining search practices and tools. Such tooling might surface distinctions and options and let subject matter experts, community members, and individuals develop (and perhaps share) their own evaluations.

    I have been shifting my research statement to engage with this and looking for how to make it happen, whether in academia, with foundation support, in a company, or as something new. I am working on this so that we might be better able to advocate and design for the appropriate role and shape of search in our work, lives, and society.1

    There is a lot of related work on evaluation of various types of systems, benchmarking, audits, complaint, etc. to build with, but that work is not narrowly aimed at facilitating open evaluation of the performance of new web search tools on public interest search topics and to support effective voice and choice in search.

    This project is intended to complement existing reporting, benchmarking and auditing efforts but focus on helping people develop their own sense of what different tools can, can’t, and could possibly do.

    This can be a framework and service that supports individual evaluations, collaborative evaluations, and requests-for-evaluations from peers, experts, and public-benefit search quality raters.

    I imagine such tooling could be used by an agency or non-profit to issue public complaint and to refine their own content and search work. Or by individuals to decide on which new tool to start to use, or to continue refusing. Or by content creators to push for better attribution or shared funding models, or develop their own systems. Or by RAG builders to demonstrate their improvements.

    Searchers, publishers, journalists, SEOs, activists, and academics have long been making complaints about and to the dominate search system and much of that is deflected and/or improvements are made that strengthen its position. We have a chance now to package our evaluations, both the good and bad that we find in search results and responses, as a broadly shared resource that might advance search in multiple ways in the public interest.


    Below I try to roughly connect some of that paths that led me here.

    Background

    Philosophy undergrad. Intelligence analyst in the US army. Professional degree program: Master of Information Management and Systems at UC Berkeley 2016. Continued into the PhD program in Information Science. Started focusing on search practices, perceptions, and platforms in 2017. Dissertation research examined the seeming success of workplace web searching by data engineers.

    Earlier

    This is rooted in:

    Spring 2023

    • The introduction of ChatGPT clearly helped many people see that search could be different. As it seemed there was an opportunity to influence the shape of search to come I looked at making myself a role in industry and started exploring generative web search systems.
    • I’ve been reflecting on the course I taught at Michigan State University last spring on Understanding Change in Web Search and what my students taught me. I’ve been thinking particularly about how we implicitly and explicitly make search quality evaluations and how we might do well to share more of these and solicit feedback from others as we strive to develop our search practices and identify what we want from search. (Coming out of my dissertation research (and following the lead of Haider & Sundin (2019)) I believe it is desperately important that we talk more about search.)

    Summer & Fall 2023

    December 2023

    January 2024


    Footnotes

    1. See Hendry & Efthimiadis (2008, p. 277).↩︎

    2. While it is not comparing search results or responses, LMSYS Org does now have ‘online’ models in their Chatbot Arena; see Jan 29 announcement. Current ‘online’ models are from Perplexity AI and Google’s Bard.↩︎

    References

    Cifor, M., Garcia, P., Cowan, T. L., Rault, J., Sutherland, T., Chan, A., Rode, J., Hoffmann, A. L., Salehi, N., & Nakamura, L. (2019). Feminist data manifest-no. https://www.manifestno.com/ [cifor2019feminist]

    Haider, J., & Rödl, M. (2023). Google search and the creation of ignorance: The case of the climate crisis. Big Data &Amp; Society, 10(1), 205395172311589. https://doi.org/10.1177/20539517231158997 [haider2023google]

    Haider, J., & Sundin, O. (2019). Invisible search and online search engines: The ubiquity of search in everyday life. Routledge. https://doi.org/10.4324/9780429448546 [haider2019invisible]

    Hendry, D. G., & Efthimiadis, E. N. (2008). Conceptual models for search engines. In A. Spink & M. Zimmer (Eds.), Web search: Multidisciplinary perspectives (pp. 277–307). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75829-7_15 [hendry2008conceptual]

    Introna, L. D., & Nissenbaum, H. (2000). Shaping the web: Why the politics of search engines matters. The Information Society, 16(3), 169–185. https://doi.org/10.1080/01972240050133634 [introna2000shaping]

    Lurie, E., & Mulligan, D. K. (2021). Searching for representation: A sociotechnical audit of googling for members of U.S. Congress. https://arxiv.org/abs/2109.07012 [lurie2021searching_facctrec]

    Mager, A. (2018). Internet governance as joint effort: (Re)ordering search engines at the intersection of global and local cultures. New Media & Society, 20(10), 3657–3677. https://doi.org/10.1177/1461444818757204 [mager2018internet]

    Mager, A., & Katzenbach, C. (2021). Future imaginaries in the making and governing of digital technology: Multiple, contested, commodified. New Media & Society, 23(2), 223–236. https://doi.org/10.1177/1461444820929321 [mager2021future]

    Meisner, C., Duffy, B. E., & Ziewitz, M. (2022). The labor of search engine evaluation: Making algorithms more human or humans more algorithmic? New Media & Society, 0(0), 14614448211063860. https://doi.org/10.1177/14614448211063860 [meisner2022labor]

    Narayanan, D., & De Cremer, D. (2022). “Google told me so!” On the bent testimony of search engine algorithms. Philos. Technol., 35(2), E4512. https://doi.org/10.1007/s13347-022-00521-7 [narayanan2022google]

    Noble, S. U. (2018). Algorithms of oppression how search engines reinforce racism. New York University Press. https://nyupress.org/9781479837243/algorithms-of-oppression/ [noble2018algorithms]

    Shah, C., & Bender, E. M. (2022, March). Situating search. ACM SIGIR Conference on Human Information Interaction and Retrieval. https://doi.org/10.1145/3498366.3505816 [shah2022situating]

    Tripodi, F. (2018). Searching for alternative facts: Analyzing scriptural inference in conservative news practices. Data & Society. https://datasociety.net/output/searching-for-alternative-facts/ [tripodi2018searching]

    Vaidhyanathan, S. (2011). The googlization of everything:(And why we should worry). Univ of California Press. https://doi.org/10.1525/9780520948693 [vaidhyanathan2011googlization]

    Van Couvering, E. J. (2007). Is relevance relevant? Market, science, and war: Discourses of search engine quality. Journal of Computer-Mediated Communication, 12(3), 866–887. https://doi.org/10.1111/j.1083-6101.2007.00354.x [couvering2007relevance]

    Ziewitz, M. (2019). Rethinking gaming: The ethical work of optimization in web search engines. Social Studies of Science, 49(5), 707–731. https://doi.org/10.1177/0306312719865607 [ziewitz2019rethinking]