Situating Web Searching in Data Engineering: Admissions, Extensions, Repairs, and Ownership

    January 6th, 2023

    See danielsgriffin.com/diss for the full text in HTML.

    citation

    Griffin D. (2022) Situating Web Searching in Data Engineering: Admissions, Extensions, Repairs, and Ownership. Ph.D. dissertation. Advisors: Deirdre K. Mulligan and Steven Weber. University of California, Berkeley. 2022. [direct PDF link]

    BibTeX

    abstract

    When does web search work? There is a significant amount of research showing where and how web search seems to fail. Researchers identify various contributing causes of web search breakdowns: the for-profit orientation of advertising driven companies, racial capitalism, the agonistic playing field with search engine optimizers and others trying to game the algorithm, or perhaps ‘user error’. Suggestions for making web search work for more people more of the time include: regulations aimed at competition or the design of the search interface; changing the conception of, metrics for, and evaluation of relevance; allowing subjects of search queries some space of their own on the results pages to speak back; proposals for public search engines; and better-informed users of search.

    I take a different tack. Rather than focusing on identifying and remediating points of failure, I seek to learn from successful searchers how they make search work. So, I look to data engineers. I closely examine the use of web search in the work practices of data engineering, a highly technical, competitive, and fast changing area. Data engineers are heavily reliant on general-purpose web search. They use it all the time and it seems to work for them. The practical success I report is not determined by some solid ‘gold standard’ metrics or objective standpoint, but by how they have embraced web search and present it as useful and more importantly essential to their work. It is success for their purposes: in gradations, located in practice, and relative to alternatives.

    Through interviews and document analysis informed by digital ethnography, I use theories from situated learning and sociotechnical systems to explore how and why search works for data engineers. I draw from feminist science and technology studies, the sociology of expertise, situated learning theory, and organizational sociology to explore and position my four core findings.

    First, I find that personal knowledge of the technical mechanisms of search plays a limited role in data engineers successful searching. Exploring why and how web search works for data engineers allowed me to probe the role of knowledge about the mechanisms of search. Contrary to dominant literature that views individual ignorance of search mechanisms as contributing to failed searches and search literacy as a necessary, if independently insufficient, path towards mitigating search failures and the harms to which they contribute, I find little evidence that data engineers’ personal knowledge of the mechanisms of search contributes to their successful use of it.

    Data engineers receive little formal on-the-job training or mentoring on how to use web search successfully. Data engineers describe web search as a solitary exercise in which they receive little formal guidance. Moreover, data engineers describe web search as a solitary practice. The absence of formal training is surprising given the professions’ admitted heavy reliance on web search. In addition to the absence of formal training, data engineers report little discussion about search practices or collaboration in searching and some discomfort with their heavy reliance. However, I find one form of talk about search, what I call “search confessions”—statements, often hyperbolic, about one’s reliance on web search—to be pervasive and a key way in which the community of data engineers legitimate their heavy reliance on web search and develop and express shared norms about how to use search well.

    Second, rather than personal knowledge, I find that occupational, professional, and technical components of their work practices contribute to their successful use of search. Expertise embedded in these components of data engineers’ web search practices improve two key search processes: query generation and results evaluation. The work practices of data engineers also decouples the immediate effects of searching from organizational action.

    To extend the description of successful web search practices, I address how data engineers confront search failure. I look at how they turn to ask colleagues questions when web searching fails and find them performing repair. These sites of coordination and collaboration post-search failure also provide opportunities for broader knowledge sharing and a space to legitimate their work and expertise, both individually and as a profession.

    If it is normal and acceptable to rely so heavily on search, it may be a surprise that there is so little talk about searching. Data engineers regularly present search as an individual responsibility—they search by themselves or on their own and desire to keep their searching private. This individual responsibility exemplifies the extent to which the firms employing data engineers do not use data from web search activity to better know and control the search practices. My findings did not reveal technology-enabled management of web search practices. I analyze the absence of firm management of search and the solitary and secretive search practices as a product of organizational reliance on data engineers to flexibly learn on the fly. The privacy of search generally protects the resources (time, attention, and reputation) of individual data engineers to pursue the distributed searching and learning on the fly they are tasked with.

    In the conclusion, I advance two further arguments before developing provocations grounded in the key takeaways. While web searching for data engineering is generally put to successful use, I show how the effective use of web search is supported by and limited to a dependence on the knowledge of others and how uneven access to community norms and knowledge limit who is effective. They key takeaways center on how web search in data engineering is continually re-legitimated; extended beyond the search box and the search results page; did not hinge on personal knowledge of the technical mechanisms of web search; is entangled with notions of responsibility, credit and blame, for knowledge; and the intentional application of technique to influence search activity, did not make an appearance.

    Being ‘better-informed search users’ for data engineers means being situated in practices around search with embedded expertise and reinforced values that support their uses of web search. For the data engineers I talked with, organizational and occupational factors including the structures of the technology, workplace interactions, and norms—all well outside of and stretching well before and after the moments of typing a query into a search box or reviewing a search results page—make search work.

    summaries

    I’ve written a few shorter summaries for various audiences. Sharing three here:

    My dissertation research seeks to understand how data engineers make generally successful use of general-purpose web search for their work. But I show how several elements of their work practices do not promote an inclusive learning environment, particularly for women data engineers and other marginalized newcomers. These include the informal means of legitimating web search as appropriate, an individualistic approach to the obligation to know (including heightened standards for women), and how keeping search practices so hidden may favor those already in power.

    My dissertation research looked at how data engineers think about and use web search in/at/for work. I describe how they successfully make use of web search through legitimating it as appropriate for their work, how search is extended across occupational, professional, and technical components of their work practices (facilitating query formulation and evaluation of search results), and how they repair search failures. I show how individually-held knowledge of the mechanisms of search is not necessary for some successful uses of search and how contextual factors can provide some defense against search automation bias. I also consider how and why the data engineers seek to keep their searching private, how they curiously do not make use of data on their own searches, and the consequences of responsibility for web search being seemingly assigned solely to individuals.

    My dissertation looks at the web search practices of data engineers through interviews and digital ethnography. I use legitimate peripheral participation (Lave & Wenger, 1991) and Handoff (Goldenfein et al., 2020, Mulligan & Nissenbaum, 2020) to look at how data engineers learn to successfully make use of web search as data engineers. I find practices that legitimate the reliance on web search in their work, despite the limited instruction and search talk. Rather than making search work by their understanding of the mechanisms of search engines, their searching is supported by it being extended across occupational, professional, and technical components of their work practices that guide their selection of search queries and structure their evaluations of search results. I also find practices of search repair, careful performances that are used to validate reliance on web search and legitimize engineers while filling in where web search is not sufficient alone. But I also find search constructed as solitary and the sole responsibility of individuals. I consider how the hiddenness of search and assignment of responsibility may limit the effectiveness of searching and of inclusive learning in data engineering work.

    corrigenda

    2023-01-06 01:21: Alongside the submitted official version, I will maintain a corrected version of the dissertation along with a list of corrections. Please do contact me if you spot an error or something otherwise deserving of repair. (I also will maintain a list of things to fix or iron out in future work. Both are distinct from Outtakes, some of which I hope to share or identify.) (1) I am aware of an issue with some in-text citations where they compile to square brackets rather than parens. (2) I also need to amend my Acknowledgements (‘A very big thank you to Alex Hughes for reaching out over the summer about reading what I’d written and offering to be a sounding board. Our conversation was incredibly encouraging and formative.’): A warning to all would-be-diss-fillers, writing the bulk of your acknowledgements in the final stressful days is ill-advised.

    References

    Goldenfein, J., Mulligan, D. K., Nissenbaum, H., & Ju, W. (2020). Through the handoff lens: Competing visions of autonomous futures. Berkeley Tech. L.J.. Berkeley Technology Law Journal, 35(IR), 835. https://doi.org/10.15779/Z38CR5ND0J [goldenfein2020through]

    Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge university press. https://www.cambridge.org/highereducation/books/situated-learning/6915ABD21C8E4619F750A4D4ACA616CD#overview [lave1991situated]

    Mulligan, D. K., & Nissenbaum, H. (2020). The concept of handoff as a model for ethical analysis and design. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190067397.013.15 [mulligan2020concept]