2. Methods and Methodologies

    tags: diss
    December 16th, 2022

    Where does one go to see web searching happen, to see web searching at work? One could try to peer into tools used for indexing, the algorithms for ranking, the work of the humans rating the quality of the search results, or the design of the SERP. One could try to gain access to search logs, recording the inputs, SERPs, and the clicks. One could use a browser extension or similar software to monitor not only the activity on the search page but also the interactions and time spent on pages after that. One could observe, in-person or using virtual tools, as people go about work which might sometimes include searching. One could create exercises in a lab, track search and browsing activity, watch people, and probe them with questions. One could look at what people who are trying to be found on search engines say and do. One could sit a group of people together to talk about how they use search. Or send out surveys. Or use identified harms, failures, complaints, or others signs of breakdown as an opportunity to see more of search in an infrastructural inversion (Bowker, 1994).

    The answer depends on how one initially imagines web searching. The above examples all provide some access to only a slice of web searching activity. Depending on what one wants to know about web searching, that may be enough. From the outset I took web searching to be a situated practice and wanted to understand how and why people search the ways they do. This is still only a slice, but the sort of web searching that I was interested involved people using web search engines. People search the web. These people have histories and contexts of action that shape how they imagine and use the search engines. People search the web within other, interconnected, practices. I took that “baseline conceptual identity” and made choices that defined (or constructed) my object of study (Marcus, 1995, pp. 105–106).

    People sometimes use secondary machines to input queries or retrieve results for web searches on search engines. For instance, the WebSearcher tool built by search researcher Ronald Robertson (Robertson, 2022)(used in Lurie & Mulligan (2021)), SerpApi (a company that provides APIs (application programming interfaces) for scraping results pages from a variety of search engines, used in Zade et al. (2022). There are libraries for different programming languages10 and various tools for searching within the command-line such as googler11. There are plugins for text editors12 and integrated development environments (IDEs)13. Or the secondary machine might be something as simple as a web browser that supports searching a highlighted portion of text through a right-click context menu.

    Web search practices are not monolithic14. “Just google it” refers to any number of routines used in many contexts. I sought to better understand web searching, with its variations in practice and effect, by looking at the constituent components of the larger sociotechnical practice and their configurations. To do this I relied on two analytical frameworks: “Handoff” (Goldenfein et al., 2020, Mulligan & Nissenbaum, 2020) and legitimate peripheral participation (Lave & Wenger, 1991).

    The Handoff analytical lens

    “Handoff is a lens for political and ethical analysis of sociotechnical systems” (Goldenfein et al., 2020). I use the theoretical framework of ‘Handoff’ to guide my methodological approach and as an analytical lens. Handoff, or the “Handoff Model” was developed for “analyzing the political and ethical contours of performing a function with different configurations of human and technical actors” (Goldenfein et al., 2020). It “disrupts the idea that if components of a system are modular in functional terms, replacing one with another will leave ethical and political dimensions intact.” (Mulligan & Nissenbaum, 2020)

    What does a system do? How? So what? The Handoff analytic looks at sociotechnical systems. The functions of that system (what it does) are distributed across different actors or components15 (how it does it). These components include people, software, physical objects, laws, and norms. These act on or engage each other in the performance of some function. These different modes of engaging may include force or perceptions of affordance16. A Handoff is where a function of a system is shifted from one component to another.17 The system is transformed even if it may be said to achieve the same functional goals. Some transformations may alter the achievement or protection of important values. The Handoff analytic can be used to focus attention on the components and their modes of acting on each other, for “exposing aspects of systems that change in the wake of such replacements,” and illuminating those aspects and changes “that may be relevant to the configuration of values embodied in the resulting systems” (Mulligan & Nissenbaum, 2020). The Handoff analytic helps identify the redistribution of functions when sociotechnical systems are reconfigured. As Mulligan & Nissenbaum (2020) write:

    It may be that transformed systems embody more positive values, but it may be that replaced components, even performing purportedly the same task, lead to a degradation— such as, dissipated accountability, diminished responsibility, displacement of human autonomy, or acute threats to privacy.

    I have been thinking with this lens since summer meetings in 2017 and 2019 as part of the Handoff team18. My initial search research (Mulligan & Griffin, 2018) was shaped by conversations in that first meeting. In the second meeting our conversations took on another search topic, the changed configurations of scholarly systems for search and evaluation in what became Goldenfein & Griffin (2022). While the latest formulation is in Mulligan & Nissenbaum (2020), I regularly referred also to an earlier working paper, Goldenfein et al. (2019)(presented at the WeRobot conference), which exposed the consequences of different configurations of imagined larger sociotechnical futures of autonomous vehicles.19 I also looked to its use by Nick Doty, examining Handoffs in HTTPS and Do Not Track, and Richmond Wong, showing design workbooks as prompts for reflections on Handoffs and investigating responsibility in user experience professionals’ “values work” practices, in their respective dissertations (Doty, 2020, Wong, 2020). I also taught the framework in a class on Technology and Delegation (with Deirdre Mulligan) in 2019 and a Values Tools Workshop (for students in the School of Information to work through identifying implications for potential social values & ethics in their final projects) in early 2020.

    Data engineers search the web at work within and across sociotechnical systems—they are also components of their firms and the field of data engineering. The various web search engines are sociotechnical systems, as is the World Wide Web they index. Data engineers use an assortment of tools within their work. Web search itself is a component of systems that people direct to, or hope might, achieve or maintain societal values such as privacy, autonomy, and responsibility. Van Couvering (2007) critiques web search engines for not advancing (and even degrading) values—or “quality goals”—such as objectivity, fairness, diversity, and representation (see also (Noble, 2018)). Tripodi (2022b) implicates web search engines as being manipulated by powerful actors spreading propaganda. Goldenfein & Griffin (2022) argues that the introduction of Google Scholar into the larger systems of scholarly search and evaluation disrupts academic autonomy. Some have suggested web search engines should or try to pursue “societal relevance”, rather than system or user relevance ( Haider & Sundin (2019); Sundin et al. (2021)). Ochigame (2020) argues for the pursuit of liberatory values. (This list of values and accomplishments or disruptions is not exhaustive.)

    I use the Handoff analytic in my research to look at how people, and other components, perform web search, and to “scrutinize” (Goldenfein et al., 2020) alternative configurations. I narrowly look at how data engineers make web search work for them, focused on the functions that support them in accomplishing their work. There are values implicated in that, like responsibility and autonomy. I also look at dignity (how data engineers are treated, how they feel doing this work) and diversity (who web search is made to work for).

    The Handoff analytic informed my research design and analysis. I do not develop a comparative study of the transformation from data engineering search work prior to web search engines to after. I use the analytic to explore searching with general-purpose web search engines, with different data engineers engaged within slightly different configurations. But I retained an appreciation for how the introduction of web search was a significant transformation. My document analysis disclosed how engineers and coders in the past searched with more emphasis on reference manuals, mailing lists, and circulars. Retired engineers would tell of their reliance on man pages20 or the large volumes stored in their workplaces. While I did not focus my interviews on these reconfigurations, the shifts suggested by such changes—including the values of responsibility, community, security, and learning—stayed on my mind. I used hints of this broader shift to defamiliarize web search for me (Bell et al., 2005) and to help me look for the various components, functions, and implicated values. How might we think about the fact that employees are going to a general pool of resources and bringing code and ideas back to the firm without that errand closely monitored or managed? Wouldn’t these firms want to know about information entering the organization?21

    The analytic also shaped my findings. I will not try to present the findings anew here, but present only very high-level connections. The first analytical chapter looks at how data engineers make it clear to each other that they are allowed and expected to rely on web search. This is communicated despite the limited access people have of others searching. How are perceptions of affordance constructed? The second looks at how component actors engage with one another in revealing what they do not know. These engagements are conducted in a performance where people attempt to communicate that they have taken appropriate responsibility and that they are prepared to receive and effectively make use of information from others. These engagements often take place in forums that permit searching and publicity, attributes that shape perceptions of possibility and punishment. The finding in the third chapter—that the functional achievements of search are the product of sociotechnical accomplishments—stems from a direct application of Handoff to the descriptions of the various components and engagements provided by the research participants. The final analytical chapter deals directly with imagined or missing alternative configurations of search.

    The Handoff analytic provides a complement to the situated learning framework and the legitimate peripheral participation analytic presented in Lave & Wenger (1991)(see next section), providing concepts to advance analytical applications of the latter towards learning in or about sociotechnical systems.

    The LPP analytic perspective

    The legitimate peripheral participation (LPP) analytic, developed by Lave and Wenger following an attempt “rescue the idea of apprenticeship ”[emphasis in the original] and to “clarify the concept of situated learning”, views “learning is an integral and inseparable aspect of social practice” (1991, p. 31).22

    Lave and Wenger draw heavily on studies of apprenticeship, and pointedly not on formal schooling23 , in understanding and explicating situated learning. A chapter titled “Midwives, Tailors, Quartermasters, Butchers, Nondrinking Alcoholics” presents five ethnographic studies of apprenticeship. They use these cases to explicate and demonstrate the LPP analytic.

    They argue “learning is not merely situated in practice, but is”an integral part of generative social practice in the lived-in world" and the goal of their book, “the central preoccupation” is to present “legitimate peripheral participation” as not just a descriptor but an analytic (p. 35). LPP engages with “belonging” (p. 35), “relations of power” (p. 36), and “peripherality in both negative and positive senses” (p. 37).

    I had many times discussed LPP in classes (particularly with Paul Duguid), read articles making use of it, watched it presented in teaching training workshops, and referred to it in my own mentorship.24 In a manner perfectly predictable with the analytic itself, my understanding of it immeasurably increased as I worked to apply it within this research. I looked closely for example at the applications made by others, like the books and dissertations of former PhD students from my program (Mathew & Cheshire, 2018, Takhteyev, 2012). Only once I started writing, engaging with Beane (2019)’s (referring also to his dissertation— Beane (2017)) critique of LPP and reading and listening closely to the language in Lave and Wenger’s book could I say that I had learned how to apply it.

    Methods

    To explore the web searching of data engineers with those two lenses, I developed a multi-sited (Marcus, 1995), networked ethnographic study. I used semi-structured in-depth interviews, subject to interpretive analysis, as well as document analysis. Conducting multi-sited interviews allowed me to talk with people in data engineering or adjacent roles in a variety of companies. Multi-sited document analysis allowed me to observe people discuss or document web searches or web searching in many places online. I started collecting and analyzing documents in December 2018. I started interviewing in June 2021.

    Multi-sited and networked

    I “followed” (Marcus, 1995), or traced, data engineer web search practices. I followed data engineers as they moved through different roles and reconfigurations of coder web search. I set out to follow the things—the web search engines themselves perpetually redesigned in code and reconfigured in the practices of searches; the search queries as refined and rejected; the search results as justified or used to justify. I followed the people as they learned of, used, and reflected on the tools they used, the search queries and the search engines. I followed the conflicts where data engineers wield web search as a weapon or wonder if its use was a sign a weakness.

    I drew on tactics from those who study algorithms and who see the performance of the algorithms as involving much more than the code itself (Bucher, 2017, Christin, 2017, Introna, 2016). One difficulty with studying algorithms in the wild is that the code itself can be hard to see. My focus was not on the code or the software or the algorithms, but practices of web search. Like algorithms, web search can also be hard to see. Web search is sometimes mundane and forgotten, opaque and invisible (Haider & Sundin, 2019, Sundin et al., 2017). So I made use of what Christin (2017)25 calls “a somewhat oblique strategy” and describes as “refraction ethnography”. If we picture light refracted by the prism, we can imagine we might learn something about the prism by looking at the light. In Christin’s case, we might learn something about the algorithms by looking at how organizations and people act around them. This is very similar to “scavenging ethnography” useful for studying “obscure objects” and finding where they “manifest.” (Seaver, 2017, pp. 6–7). Seaver writes that “the scavenger replicates the partiality of ordinary conditions of knowing— everyone is figuring out their world by piecing together heterogeneous clues.” I followed, scavenging for tracks, the “technological artifacts” of web search and the material-discursive practice of coder web search “as they circulate[d] through institutional sites and epistemic networks”. Christin “focuse[d] on how algorithms are refracted through organizational forms and work practices” in web journalism and critical justice. I attended to how the various algorithms imagined and implicated in coder web search are refracted through not only the work practices of data engineers, but how they are refracted as forms that organize their labor (as structures for the work practices and aids in navigating both capital and collective configurations of coder web search).

    Burrell (2009) provides steps for field construction that support such following—following the people and things through refractions. I identified “entry points” while seeking to “maintain[] a concentrated engagement with the research topic” (pp. 190-191) and “consider[ed] multiple types of networks” (p. 191). Entry points included mentions of web search in relation to software engineering or coding work (and so many other mentions of web search). I’d find a popular tweet about coding work and search and read the replies, see links within to blog posts, and then find those blog posts discussed further on forums.26 Or the replies would have stock phrases that I could search in turn. I’d search these on Twitter, Reddit, or a web search engine. One could enter into at any number of points and be sent around from one social media platform to a forum then a Q&A website, followed by a blog, and back again. This was a “traveling through” ( Burrell (2012), pp. 32-33). These webs spanned decades, linking patterned jokes, questions, and articulations about search together. I had saved searched on the Twitter app on my phone. I would use idle time to search out new mentions of googling, though my Twitter feed—informed by my curation and constant engagement—also might offer the repetitive jokes and confessions of searching. Better were my friends and colleagues who would send examples they saw my way.

    Another set of entry points were the questions and answers on websites like Stack Overflow, or the issues raised in GitHub repos or Slack27 workspaces for tools used by data engineers. I saw patterns very similar to those mentioned by my interviewees, or depicted in the screenshots they shared. Then also the tutorials, blogged reflections, and formal training materials on tools and strategies for data engineers. I reviewed them for topics related to web search (and its alternatives) and to immerse myself in the language of the tools and the contexts my research participants discussed. These entry points were always partial and they “did not fully contain,” even when viewed in composite, “the social phenomena under study” (Burrell, 2012, p. 32)–though scavenging and refraction anticipates that. These entry points, though, provided access to the people I would talk to, the words and tools and jokes they used, and let me observe others like them interacting. These overlapping and interlinked networks did not let me observe or question data engineers as they input a query or made sense of a SERP, but they did let me see how web search in the work of data engineers is so much more than that.

    I also make space for the “[i]magined spaces” or “speculative imagination” depicted or disclosed by the words and actions of research participants and coders engaging publicly online (pp. 193-194). One such place is the imagined Google, Takhteyev (2007), in his ethnographic work on Brazilian software developers, noted that “the Google campus” “serve[d] as a source of tremendous symbolic power” and was “the single most important place in the imagination of the developers” [p .5]. With my training in scenario thinking, and following the examples in the presentations of the Handoff analytic (Goldenfein et al., 2020, Mulligan & Nissenbaum, 2020) and Noble (2018)’s comments on imaginings28 , I attended closely to suggestions of alternatives ways of searching or alternative designs of work or the search engine.

    In my field construction, I “lack the “deep hanging out”" (Dunbar-Hester, 2020, p. 25) and did not pursue “total immersion.” I did not gain access to proper sites of work to observe or gain employment to participant fully in the work myself. But I followed the “driving logic” of field work: “that we can gain analytic insight by inserting ourselves in the social milieu of those we seek to understand” (Coleman, 2012, p. 5). But, over the four years since first collecting and analyzing the documents on coders searching the web, I have been engaged—in an ethnographically-informed and digitally-enabled way—in the host of activities imagined of fieldwork: “participating, watching, listening, recording, data collecting, interviewing, learning different languages, and asking many questions” (Coleman, 2012, p. 5).

    Documents

    Before starting my interviewing, I collected and analyzed documents related particularly to data engineer use of web search at work, include public social media, blogs (and comments), books, YouTube videos, podcasts, instructional materials, and policy guidance. In addition to the coding approach indicated below, I conducted situated document analysis, working to include consideration of production, consumption (and use), and content (see, ex. Prior (2003)).

    I read through hundreds of pages of primary material from the public web related to the use of web search by coders broadly: original posts and commentary on personal blogs, Twitter, Quora, Reddit, Hacker News, and Stack Overflow about questions like: am I a real programmer if I have to look everything up? do expert coders search all the time? how did software engineers work before Google was invented? discussions about the role of web search in Stack Overflow and jokes and memes related to coder reliance on or expertise in web search (ex. referring to software engineers as professional googlers). I created Twitter lists to watch how people who coded for work talked about their work generally.

    The material I gathered went back to 2003. I also looked at training materials, books (popular and academic histories, resources for coders), tutorials, and other introductions to programming to familiarize myself with the community of practice and looking specifically for elements related to coder web search (like commentary on troubleshooting or discussion of norms around “reading the fine manual”). And I’ve also looked at research within computer science and empirical software engineering going back decades as it relates to searching and searching the web. I have also read books and material about programming and software engineering, to familiarize myself with the work and craft and to review for connections to the subjects at hand. This includes books like Hunt & Thomas (1999) and Seibel (2009), and the more data engineering oriented Kleppmann (2017).

    During the interviewing I also joined multiple Slack workspaces for open source tools for data engineers and the sub-Reddit r/dataengineering (for background, not for analysis). In the course of the interviewing portion of this research I was also provided or referred to documents by the research participants. Documents provided included training materials, screenshots of workplace messages, screenshots of code comments, tweets, blog posts, and news articles. Research participants shared these during the interviews, as well as months after.

    Interviewing

    I conducted 38 semi-structured in-depth interviews with 30 participants (see Appendix I. Research Participants ). The interviews were semi-structured, some questions and topics were planned in advance (see Appendix II. Annotated Interview Guide ), derived from approaches suggested by the literature and developing research. I modified the interview guide over the course of the research. The interviews were conducted over phone or Zoom video calls. I took notes during the interviews and recorded and transcribed them.

    I treated “interviews as fieldwork”, as “part of the world in which research subjects live and make meaning” (Seaver, 2017, p. 8). My interview guide was directed towards identifying different values, functions, different components, and the larger contexts of web search. This was to guide my analysis with the Handoff analytic. The modes of engaging between two components, particularly when one component is a human (i.e. force and perceptions of constraint or affordance), can be made somewhat accessible through interpretative interviewing. Interviewing also allowed me to approach concerns of the LPP analytic: approval, belonging, participation, and peripherality. Pugh (2013) argues interpretative interviews can provide “different levels of information about people’s motivation, beliefs, meanings, feelings and practices” (p. 50). I noted and probed “display work” (where interviewees presented their best face); metaphors and jokes; laughter, silences and non-verbal communication that show “what kind of things are uncomfortable, horrifying, emboldening, joyful”29 ; and dissonances (pp. 50-51). I attended to, questioned, encouraged, and prompted “specific examples” (p. 50).

    Key changes I made to the initial versions of the interview guide were to add questions at the start and end. I asked an “initial reaction question” at the start of most of my interview, asking them how they initially reacted to hearing out this topic of research. This was a very fruitful question that revealed people noting surprise or appreciation, stating they had never thought of the topic before, or describing the extent they used web search in their work. I was then able to refer back to that initial reaction later in the conversations. I also added a question at the end to ask whether they had any final questions for me and asking for any final reflections. These worked as long as I asked those with enough time left or the interviewee granted an extension. The first was fruitful in revealing their concerns or interests. This sometimes shaped my understanding of the preceding conversation and also sometimes challenged my understanding of search at work itself. The second, asking for final reflections, was useful for that as well, but also often revealed an appreciation for the chance to talk and reflect on this work practice. I used those sorts of comments in subsequent recruitment efforts and referred to them in analyzing consequences of the often solitary and silent search practices.

    Sampling

    My sampling for research participants was driven by choices around transferability, the Handoff and LPP analytical frameworks, and pragmatic concerns. To gain tractability, when shifting to interviews, I narrowed my focus to a subset of those who write code for work: data engineers. I selected this site for four key reasons. First, it seemed likely to include people who were relatively technically sophisticated and so particularly capable of appreciating the technical mechanisms of web search. Second, it appeared to be a particularly dynamic field that requires a significant amount of learning on the fly (Avnoon, 2021, Kotamraju, 2002) and so perhaps even more heavily reliant on search than ‘coding work’ is generally. Data engineers are involved in building and maintaining tools sometimes used in efforts to control, replace, or surveil other people, practices, and tools. So, third, I saw them as perhaps more likely to have sophisticated understandings of the underlying technologies and an appreciation, perhaps, for the uses and misuses of data and automation built on or around it30. In that, fourth, data engineers would likely also be capable of redesigning of refashioning their tools and practices in the face of perceived constraints or affordances (Bailey & Leonardi, 2015, Leonardi, 2011, Vertesi, 2019).

    While following trails to find data engineers I generally passed over attempting to recruit those working in the most prominent technologies companies, sometimes grouped as FAANG (Facebook, Apple, Amazon, Netflix, and Google). It seemed likely that they may have resources and constraints very different from most data engineers and research overly focused on their web search practices may have limited transferability (or at least developing and demonstrating transferability may have been harder). Concerns about transferability also led me to look to interview only data engineers who worked in teams with other data engineers and only to people in the United States.

    The Handoff analytic informed my sampling. I recruited interview participants who worked in different contexts so I might explore the configurations of components, the system functions, and emergent values in different sorts of organizations. I looked for people working in different industries, at different levels, and on different types of teams. I looked at data engineering broadly, not constraining my search to people using particular tools or platforms. Some interviewees did not formally have data engineer as their title though their work tasks included those described as data engineering. Some interviewees worked with more established technology and others were tasked to work with the newest tools. I also spoke with people in relatively adjacent roles: a developer of open source software used by data engineers, a developer advocate for open source software used by data engineers, and a site reliability engineer.

    The LPP analytic drove me to sample for people new to data engineering and full members. I looked for people who had responsibility for interviewing, hiring, on-boarding, and managing. I looked for people who had transitioned to data engineering from distinct work (like from software engineering or data analysis) and whose formal training prepared them to work as data engineers directly out of college.

    I interviewed people from across the United States, east coast, west coast, and in-between. These data engineers worked on internally-facing online analytical processing (OLAP) systems for use cases like business analytics and customer-facing online transaction processing (OLTP) in production systems for providing access to documents, serving advertising, or making machine learning recommendations. These people worked in apparel, computing technology, enterprise software, entertainment, financial services, fitness, healthcare, media streaming, online marketplace, open source software, retail, social media, web analytics, and web publishing. From the principal and staff level to recent hires, these data engineers were individual contributors, managed teams, and were the technical leads for projects.

    The variance, or “de facto comparison dimensions” (Marcus, 1995), was emergent. Team size and history, organizational experience with contemporary data stacks (the tools and platforms used in the production of a data service), and competitive work environments provided sources of variance that made refraction (Christin, 2017, 2020b) more visible. Large and small teams with small and long histories, companies expanding into new-to-them data spaces versus those extending prior successes, and competition were not the variables of central interest, but they seemed associated with social dynamics and workplace relationships that contributed to concrete examples shared by the interviewees. This variance, or perhaps these extremities, seemed to make workplace web searching, at least in the interviews, less mundane (Sundin et al., 2017) or invisible (Haider & Sundin, 2019).

    The Handoff analytic helped illuminate how the larger systems of web search in the work of data engineering engaged with different people. I was particularly interested in identifying potential harmful patterns in these practices (including how people respond with resistance, repair, or routing around). Prior work shows that women in the coding professions are often mistreated or pushed out. This is shown in research showing how ‘coding work’ came to be presented, perceived, and performed as masculine work in the 20th century (Abbate, 2012, Ensmenger, 2010, Hicks, 2017), which persists (Dunbar-Hester, 2020, Misa, 2010). My document analysis, prior to starting my interviewing, suggested there was a high degree of secrecy surrounding search, so I knew it was important to hear the accounts of women to gain a fuller understanding of data engineering web search practices, and their implications. I interviewed women data engineers at nearly double their representation in the field (36.6% to 18.5%, according to one demographics report31 ). This helped me identify and document an environment within web search practices in data engineering sometimes filled with anticipations of shaming and unequal judgment, particularly visible to women and people new to the field.

    I failed in my attempts to recruit black data engineers, a limitation of this study. Talking with people from multiple marginalized groups and situated within multiple may have allowed me to better understand how they both identified and responded to harmful practices. I did not focus on other demographic characteristics, unless they became salient in the course of an interview. Most of the data engineers I interviewed presented as white or South Asian.

    There were also pragmatic concerns—identifying and contacting data engineers. To provide practical (as well as analytical) scoping for my research I planned to only interview people in the United States (and this was also in my IRB protocol). The multi-sited networked design of the study also meant that sometimes I was looking for all sorts of “entry points”, unable to plan or predict access (Burrell, 2009, pp. 190–191). This meant I was unable to contact for interviews many data engineers who I was able to identify contact information for or who directly reached out to me on LinkedIn.

    Recruitment

    Research participant recruitment started with snowball recruitment among with friends and classmates and my extended network. I also reached out directly to people discussing data engineering online.

    I posted messages in channels on the UC Berkeley School of Information’s internal Slack workspace32 , my Twitter account, my LinkedIn account, on a sub-Reddit for data engineering33 (first contacting the moderators for permission), and my personal website34. I received multiple inquires referencing the LinkedIn post, though none specifying Reddit or Twitter. I also sent direct messages to people on Twitter where that was an option and I was not able to locate an email address. For immersive purposes and for recruitment I ‘followed’ publicly engaged data engineers and those in adjacent roles on Twitter, including making a Twitter list. I found data engineers to reach out to on social media websites, forums, and blogs—traversing links to find email addresses or homepages with contact forms through Hacker News, GitHub, and LinkedIn.

    Coding

    I made use of some methods informed by grounded theory in efforts to ground my observations, analysis, and representations. While not taking on all their prescriptions or descriptions (i.e. the notion of “data reduction” gained from applying metadata to other data), I used “flexible coding” (Deterding & Waters, 2018). I coded written records of my encounters with informants, transcripts of interviews, and documents collected with codes derived in part inductively but also according to expectations and anticipations from prior research and sense-making of the research area. (Note that while the goal was to develop an emic account of coder web search practice, the deductive codes assisted me in attending to my expectations and improved my ability to identify sources of changes in my thinking). These codes included also my “own emotional reactions to the respondent’s narrative”, where I recorded not only surprise but disgust and heartache (Pugh, 2013, p. 51 (Fn5)).

    Memoing

    Throughout the research I wrote memos. I wrote memos on what I was confused by, surprised by, and excited by. Rather than developing a regular rhythm or routine for memoing, I scavenged (Seaver, 2017) for memos. I regularly turned my stray remarks from others or times I caught myself going on about my research to anyone into opportunities to put my thoughts to tangible words on paper. Or I would start to type a question to a colleague and realize partway through that the question could also (or instead) be a memo. I would write memos while reflecting on meetings with advisors——trying to find the words and so do the thinking to better understand something I struggled to convey to them. In some of these memos I juxtaposed competing codes or claims from interviewees. As an example, one significant memo was from a sudden realization I had on a walk and scribbled in the Notes app on my phone: “They’ve black boxed everything! By-and-large the practices and the tool itself.” Tweets of mine, or many deleted tweet drafts, also provided seeds for fuller memoing. Some memos were born of my own search frustrations—a constant comparison across contexts that could be “contrasted, elaborated, and qualified” when juxtaposed with data engineering web searching (Orlikowski, 1993, p. 5). when I spent an hour disassembling and reassembling a pepper grinder to add peppercorns and only later realized I could have popped the top off, which I might have known had I done a web search—the name of the grinder clearly visible. Or when I bought the wrong size screws for hanging drywall. Many of these memos started from notes taken on runs—runs where I listened to books or articles35. My wife put up with me sometimes stopping mid conversation to jot down the stub of something for a future memo or writing it herself if we were driving. I could not force myself to write memos36 , and I couldn’t stop myself from writing them once I had a thought that was not lodged.

    A significant breakthrough in my sense of my study occurred when one of my interviewees asked if I would talk to their organization about my initial findings. This led to a flurry of activity as I hurriedly repackaged my initial memos and summaries of interviews into tentative findings. I then sat with the raw materials of my presentation, the starts of different framings and outlines. I reflected on how I felt in the presentation, claiming or hinting at different tentative findings. This forced me back to the interviews, to make codes I had felt but had not yet recorded. Soon thereafter, I had conducted 25 interviews at this point, I reached “meaning saturation” (Burrell, 2009, p. 194), at least within the scope of work possible for this research project.

    Surprises

    I attended carefully to surprises, noting them through in interview notes, memos and scavenged memos from off-the-cuff response to questions from advisors and colleagues. Surprises noted include:

    • lack of criticisms of the search engine, minimal sharing about search practices or search stories
    • no bespoke code to aid searching (“gap-bridging” (Bailey et al., 2010) ), missing strong norms of reciprocity
    • no hints of surveillance of web search, even reference to one’s own search history.
    • near ubiquity of the use of Slack workspaces
    • hyperbole in confessional statements of heavy reliance on web search (mirroring material found on the web, but surprised in the moment by the repeated strength of exaggeration in interviews)
    • vast variety in revealed knowledge of the mechanisms of web search
    • data engineers saying they had never thought about search or had taken it for granted—search as invisible or mundane (Haider & Sundin, 2019, Sundin et al., 2017) —the data engineers were so reliant on search that I had imagined they would see it as their “topic, or difficulty” (Star, 1999) rather than taken for taken-for-granted infrastructure

    I created a document to track when I changed my perspective on what my data meant. These were my shifts or pivots, a change in direction due to a realization or recognition of something previously unseen, something like a surprise. As I combed through prior memos I began to write mini memos of those shifts I had not explicitly recorded. Memos began to be a way for me to understand my prior shifts in emphases or framing. I could see where sometimes my understanding of a theme or chapter completely changed. These shifts were not small. For example, at first I did not see the knowledge embedded in the practices. I did not see where legitimate peripheral participation might exist in these solitary and secretive practices. And just a month before filing, my thoughts about the privacy of searching shifted from a focus on the benefits to recognizing a need to be clear about potential harm. Some of these shifts or pivots were from doing the work of writing the drafts—the “activity of writing”, the “physical act of writing” itself, was “a process of discovery [ . . . ] and surprise” (Lofland & Lofland, 1995). Some shifts and pivots were from challenges raised by peers, advisors, or research participants. The intentional processes of memoing—including reviewing, synthesizing and building on my memos—and working to explain my findings to others helped me to re-examine the data and see where it was showing me something different from what I had originally seen.

    Member checks

    Starting in the summer of 2022, I reached out to some of the data engineers I had previously interviewed and new participants for member checks. Also called “participant or respondent validation” (Birt et al., 2016), these are for “communicative validation of data and interpretations with members of the fields under study” (Flick, 2009). For both groups I conducted a form of “synthesized member checking”. Birt et al. (2016) writes that this “addresses the co-constructed nature of knowledge by providing participants with the opportunity to engage with, and add to, interview and interpreted data, several months after their semi-structured interview.” I shared my analysis and findings with the interviewee, including referencing the prior interview and particular places I was planning on using their words, as applicable. For new participants I provided an overview of my research like I did with initial interviews, asked my initial reaction question, and only asked a few orientation questions before presenting my findings.

    I conducted member checks with seven prior participants (and exchanged only emails with another) and five new participants. I emailed the participants and used a scheduling software37 to allow them to select to talk with me for different lengths of time, from 15 minutes to one hour. I did two 30-minute member checks with one participant. One of the new participants was someone I had reached out to in the previous round but never received a reply. Another participant I reached out to because they had tweeted about something that another member check had just mentioned. Only one other new participant was not on my original list of potential interviewees.

    My emails inviting participants included a brief description of what I was writing in my dissertation drafts and a single tagline for the four analytical chapters (as developed at the time of the emails).38 I opened the interviews with a high-level discussion of where I was in my research process (conducted initial interviews, coding and memoing, following by iterating on drafts of chapters, and now turning to member checks) and a description of the goals for the member checks. I told them it was similar to an interview, requesting consent to take notes and record. New participants were also given the same consent for research forms. For some of the participants I mentioned specific things they had said in the initial interviews that I was quoting and discussing in my text, sometimes at the front but mostly interspersed with our conversation.

    I wrote a guide for myself on how to introduce the member checks. While I often paraphrased in the actual conversation, I had written the following: “I want to get feedback (including pushback & resistance), ideally feedback in your own words (rather than yes/no) that might refine, extend, or challenge what I’m saying. This is co-constructed research, so I’ll be upfront: my dream is sorta like in improv: I say something and you say “yes, and”, “yes, but” OR “no, actually”." I also suggested high-level questions for them to consider:

    • Is the analysis believable/credible?
    • Anything feel off? Or exciting?
    • Is the analysis useful or meaningful to you?

    After the first two member checks I made sure to add this as well: “Please do challenge me, in my last member check someone raised a question that helped me identify a key connection I hadn’t seen at all before.” I presented the member check in this manner with the goal of the participants seeing themselves fit to provide “commentary, correction, and elaboration” (Orlikowski, 1993, p. 10).

    I generally discussed my tentative findings in the order I’d arranged the chapters in my drafts at the time of the interview (for most of the interviewees that was STRUCTURING, LEARNING, SHARED, GAPS).39

    As I told the member check interviewees I would, I pulled material from our conversations into the dissertation. Sometimes they provided descriptions of experiences or reflections that filled gaps or reinforced my findings. I also make note and comment in the chapters that follow where they provided pushback. Sometimes that pushback identified for me things that I may be better off avoiding discussing or needed to take more care in discussing. I also highlight spots where comments from participants shifted my understanding of my research.

    Positionality

    Before presenting my data and findings, I will describe some of my positionality. With this I “seek to acknowledge [my] situatedness and make it an explicit component of the research process” (Christin, 2020b, p. 13). This provides an overview of my background as related to my questions and sites of research, as well as a place for me and the reader to prepare to qualify or critique my standing for different claims I will make in the subsequent chapters. I had early exposure to computers, but I could not name them, unlike the developers that Coleman (2012) spoke with, who “would almost without fail volunteer the name and model number of the specific device” (p. 28). I was very attentive to approaches to teaching and systems of education. My father’s parting words were almost always, “have fun.” I learned that “creativity and play are socially motivated and socially learned” (Ames, 2019, p. 32) and of their role in learning, collaboration, and teamwork. I participated in a wide range of tutoring, mentoring, apprenticeship, and immersive training in social environments that ranged from caustic to liberatory. Experience with searching and teaching search—trying to search alone, managing teams of intelligence analysis reliant on search, searching as a data analyst, and teaching new programmers—shaped my attention and the comparative analysis across domains and situations for searching. I will focus on experiences related to searching, learning to use software systems or programming languages, and training (particularly that outside of formal schooling).

    I remember being excited when my oldest sister mentioned HotBot, a search engine launched in 1996. It seemed to me so much better at returning relevant results than the mismatch of search engines and search aggregators that I used at the time. But those were better, for my perspective as a child exploring, than the confines of AOL search, itself better than Microsoft Encarta. I was fortunate to have access to the family computer in our dining room starting sometime in elementary school. The first web search that I remember was searching [marines] , and then being whisked away by my mother when the search engine, perhaps Juno, returned gay porn. We also made regular use of the local library, so much so that I had my library card memorized for placing holds on books from home.40 I vividly recall two older boys, at a birthday party sometime before middle school, trying to one-up each other while talking about how they knew how to set up a website. I was curious but didn’t really understand. I was homeschooled until fourth grade, then in a co-op classroom for two years, before joining an “innovative” public middle school (where everyone had to take band). A close friend showed me how he’d learned to display scrolling text in his computer class in middle school, but there was not one offered at my school. I do not recall any formal lessons on searching the web, though I did take a keyboarding class in high school. After two years at the local public high school I switched schools to a private k-12 school, before spending my entire senior year taking classes at the community college.

    In undergrad I slowly learned more about the internet, the World Wide Web, and the tools supporting their use. I set up a short-lived blog. After my first year I made a small static website as a parting gift to my dormmates—just a grid of names and photos. I thought I would create a website where students could share and collaborate on notes. I even pitched it to my undergrad advisor—a history professor. I got so far as setting up a website with MediaWiki and spamming classmates anonymously with links to my class notes. I installed a Linux operating system on my laptop but could not figure out how to connect to campus WiFi. My best friend and roommate was a math and computer science major but I was focusing on history, political science, and philosophy. I also studied New Testament Greek, and thought of possibly becoming a pastor. I’d signed up for a programming class one semester but changed my mind before classes started. I worked as a “junior office manager” in a small architecture firm my first summer after starting college. I made coffee, checked the mail, answered the phone, organized the office bookshelf, and convinced my boss I could learn enough by searching the web to make updates to the firm’s website. My second summer I did an immersion program in Arabic. Both summers, and for a couple months after graduating from college, I also worked at the local McDonalds.

    I joined the U.S. Army after college. I enlisted to be an all-source intelligence analyst, under the impression that I could take my philosophy degree overseas and make a difference.41 “All-source” meant my role was to contribute to planning and synthesis of intelligence collected through a variety of intelligence sources, including imagery, geospatial, signals, and human intelligence debriefing and interrogation.42 That involved a lot of searching. Beyond training in intelligence analysis—including in searching various systems—, I also learned how to operate weapons, jump out of airplanes, rappel out of helicopters, drive a 2 1/2 ton truck, and run in formation while singing cadences.

    While deployed in Iraq my lieutenant required that at the end of every day I share the list of searches that I had done. These, on a secure network, were generally searches in Query Tree, an application on the U.S. Army’s multi-billion dollar Distributed Common Ground System-Army (DCGS-A), but also searches of reports and briefings stored on Microsoft SharePoint and OneNote. One reason for monitoring searches was that, despite boolean search operators, searching Arabic names transliterated into English was fraught. While I could reference a gazetteer to identify preferred spellings, names were not controlled. During that deployment I learned the basics of Microsoft Access and how to edit VBA scripts in Microsoft Excel, creating systems to store the results of my searches because the servers running DCGS-A regularly ran slow or had to be restarted.

    Over the years I was promoted and became responsible for the work of a team of analysts. I taught them rudimentary lessons in searching the web and the multiple secure networks and databases we used. My teaching responsibility here mostly involved one-on-one and small team instruction and mentorship. As an intelligence analyst I taught (or trained) my soldiers geography, history, software systems, analysis techniques, and public speaking. They joined the army with an array of educational backgrounds, from a GED to a college degree. On the side I thought I should learn to program. I started basic introductory tutorials for Rails, Lisp, and Python but did not get far. My unit tested out Palantir, which definitely felt fancy and fast, even if it made excessive assumptions about the searcher’s goal and the underlying data. I later deployed to Afghanistan, endlessly searching while generally safe in an air conditioned tent. I realized we’d been misinterpreting aggregated location data that Palantir provided (it functioned differently than our standard systems) and convinced our shop to stop using them, though got nowhere in my attempts to convince their engineers over email to make changes. I used other tools as well. I could read enough of Python to adapt some automation behavior in ArcGIS, a geographic information system that we used. I recorded macros in Microsoft Word to help me prepare documents in particular formats.

    I finished my enlistment and applied to graduate schools to study collaborative learning, perhaps with new tools. I’d had fellow analysts search themselves into a frenzy of overblown fear or search just enough to be able to overclaim some copy-and-paste expertise. I’d had superiors think we could just pick up and use our systems without regular training. Even with training our tools could be frustrating to use and unreliable. My philosophy degree only got me so far.

    After applying to graduate schools, I took a job as a data analyst at a charter school in New York. I helped prepare testing materials, process test scores, develop a file naming convention, review data analysis software for potential adoption, and fumbled around using jQuery to adapt Microsoft SharePoint. During that time I found out I had been accepted to the UC Berkeley School of Information. I received an email from the admissions coordinator, saying “the admissions committee does have concerns about your preparation in programming [ . . . ] Before the beginning of the Fall semester you are expected to take at least one introductory programming course [ . . . ].” So I took an “Introduction to Programming with C++” course at Borough of Manhattan Community College.

    I took a short “Python Boot Camp” intensive before my first semester started. Then jumped into programming, taking an applied natural language processing class my first semester. I’ve used Python regularly since43 , writing class projects or utilities for personal use. This document itself was produced with the aid of several scripting tools I’ve written and many code-related web searches. I then taught the “Python Boot Camp” for incoming graduate students for three years. This was a three-week intensive summer class for incoming masters and PhD students who were either completely new to programming or new to Python. The use of “boot camp” was amusing to me. I made sure I did not replicate the sort of embarrassment, harassment, or other punishments that came with some of my military training. (Though I had to be pushed by students to stop using a phrase from the military. I would often—intentionally—use “Too easy” to refer to things that were actually quite hard. It was supposed to motivate people, but I suppose only in the right contexts and in relation with people well-situated within those contexts. Part of the reason it didn’t land in the class was because I had students who had never programmed and who weren’t members of groups who were generally expected or welcomed in coding communities.) In classes I also worked in R and JavaScript. I was a professional masters student before joining the PhD program. My master’s final project team built an automatic question generation system and I contributed to the Python codebase. I wrote Python code to pull tweets from the Twitter API for Burrell et al. (2019) and Griffin & Lurie (2022).

    Bibliography

    Abbate, J. (2012).Recoding gender: Women’s changing participation in computing (history of computing). The MIT Press. https://mitpress.mit.edu/9780262304535/ [abbate2012recoding]

    Ames, M. (2019).The charisma machine : The life, death, and legacy of one laptop per child. The MIT Press. [ames2019charisma]

    Avnoon, N. (2021). Data scientists’ identity work: Omnivorous symbolic boundaries in skills acquisition.Work, Employment and Society,0 (0), 0950017020977306. https://doi.org/10.1177/0950017020977306 [avnoon2021data]

    Bailey, D. E., & Leonardi, P. M. (2015).Technology choices: Why occupations differ in their embrace of new technology. MIT Press. http://www.jstor.org/stable/j.ctt17kk9d4 [bailey2015technology]

    Bailey, D. E., Leonardi, P. M., & Chong, J. (2010). Minding the gaps: Understanding technology interdependence and coordination in knowledge work.Organization Science,21 (3), 713–730. https://doi.org/10.1287/orsc.1090.0473 [bailey2010minding]

    Barley, S. R. (1986). Technology as an occasion for structuring: Evidence from observations of ct scanners and the social order of radiology departments.Administrative Science Quarterly,31 (1), 78–108. http://www.jstor.org/stable/2392767 [barley1986technology]

    Beane, M. (2017).Operating in the shadows: The productive deviance needed to make robotic surgery work [PhD thesis]. MIT. [beane2017operating]

    Beane, M. (2019). Shadow learning: Building robotic surgical skill when approved means fail.Administrative Science Quarterly,64 (1), 87–123. https://doi.org/10.1177/0001839217751692 [beane2019shadow]

    Bechky, B. A. (2003). Object lessons: Workplace artifacts as representations of occupational jurisdiction.American Journal of Sociology,109 (3), 720–752. [bechky2003object]

    Bell, G., Blythe, M., & Sengers, P. (2005). Making by making strange: Defamiliarization and the design of domestic technologies.ACM Transactions on Computer-Human Interaction (TOCHI),12 (2), 149–173. https://doi.org/10.1145/1067860.1067862 [bell2005making]

    Birt, L., Scott, S., Cavers, D., Campbell, C., & Walter, F. (2016). Member checking: A tool to enhance trustworthiness or merely a nod to validation?Qualitative Health Research,26 (13), 1802–1811. https://doi.org/10.1177/1049732316654870 [birt2016member]

    Bowker, G. C. (1994).Science on the run: Information management and industrial geophysics at schlumberger, 1920-1940 (inside technology) (Hardcover, p. 199). The MIT Press. https://sicm.mitpress.mit.edu/books/science-run [bowker1994science]

    Bucher, T. (2017). The algorithmic imaginary: Exploring the ordinary affects of facebook algorithms.Information, Communication & Society,20 (1), 30–44. https://doi.org/10.1080/1369118X.2016.1154086 [bucher2017algorithmic]

    Burrell, J. (2009). The field site as a network: A strategy for locating ethnographic research.Field Methods,21 (2), 181–199. https://doi.org/10.1177/1525822X08329699 [burrell2009field]

    Burrell, J. (2012).Invisible users: Youth in the internet cafés of urban ghana (acting with technology). The MIT Press. [burrell2012invisible]

    Burrell, J., Kahn, Z., Jonas, A., & Griffin, D. (2019). When users control the algorithms: Values expressed in practices on twitter.Proc. ACM Hum.-Comput. Interact.,3 (CSCW). https://doi.org/10.1145/3359240 [burrell2019control]

    Christin, A. (2017). Algorithms in practice: Comparing web journalism and criminal justice.Big Data & Society,4 (2), 1–12. https://doi.org/10.1177/2053951717718855 [christin2017algorithms]

    Christin, A. (2018). Counting clicks: Quantification and variation in web journalism in the united states and france.American Journal of Sociology,123 (5), 1382–1415. [christin2018counting]

    Christin, A. (2020a).Metrics at work: Journalism and the contested meaning of algorithms. Princeton University Press. [christin2020metrics]

    Christin, A. (2020b). The ethnographer and the algorithm: Beyond the black box.Theory and Society. https://doi.org/10.1007/s11186-020-09411-3 [christin2020ethnographer]

    Coleman, E. G. (2012).Coding freedom: The ethics and aesthetics of hacking. Princeton University Press. https://gabriellacoleman.org/Coleman-Coding-Freedom.pdf [coleman2012coding]

    Deterding, N. M., & Waters, M. C. (2018). Flexible coding of in-depth interviews: A twenty-first-century approach.Sociological Methods & Research, 0049124118799377. https://doi.org/10.1177/0049124118799377 [deterding2018flexible]

    Doty, N. (2020).Enacting Privacy in Internet Standards [Ph.D. dissertation, University of California, Berkeley]. https://npdoty.name/enacting-privacy/ [doty2020enacting]

    Dunbar-Hester, C. (2020).Hacking diversity: The politics of inclusion in open technology cultures (1st Edition). Princeton University Press. https://press.princeton.edu/books/hardcover/9780691182070/hacking-diversity [dunbar2020hacking]

    Ensmenger, N. (2010).The computer boys take over: Computers, programmers, and the politics of technical expertise. The MIT Press. [ensmenger2010computer]

    Flick, U. (2009).An introduction to qualitative research. SAGE Publications Ltd. [flick2009introduction4]

    Goldenfein, J., & Griffin, D. (2022). Google scholar – platforming the scholarly economy.Internet Policy Review,11 (3), 117. https://doi.org/10.14763/2022.3.1671 [goldenfein2022platforming]

    Goldenfein, J., Mulligan, D. K., Nissenbaum, H., & Ju, W. (2020). Through the handoff lens: Competing visions of autonomous futures.Berkeley Tech. L.J.. Berkeley Technology Law Journal,35 (IR), 835. https://doi.org/10.15779/Z38CR5ND0J [goldenfein2020through]

    Goldenfein, J., Mulligan, D., & Nissenbaum, H. (2019).Through the handoff lens: Are autonomous vehicles no-win for users. https://pdfs.semanticscholar.org/341d/a18649eb9627fe29d4baf28fb4ee7d3eafa3.pdf [goldenfein2019through_draft]

    Griffin, D., & Lurie, E. (2022). Search quality complaints and imaginary repair: Control in articulations of google search.New Media & Society,0 (0), 14614448221136505. https://doi.org/10.1177/14614448221136505 [griffin2022search]

    Haider, J., & Sundin, O. (2019).Invisible search and online search engines: The ubiquity of search in everyday life. Routledge. https://doi.org/https://doi.org/10.4324/9780429448546 [haider2019invisible]

    Hanselman, S. (2013).Am i really a developer or just a good googler? https://www.hanselman.com/blog/am-i-really-a-developer-or-just-a-good-googler . [hanselman2013really]

    Hanselman, S. (2020).Me: 30 years writing software for money. https://twitter.com/shanselman/status/1289804628310110209 . [hanselman2020me_tweet]

    Haraway, D. J. (1988). Situated knowledges: The science question in feminism and the privilege of partial perspective.Feminist Studies,14, 205–224. [haraway1988situated]

    Hicks, M. (2017).Programmed inequality: How britain discarded women technologists and lost its edge in computing. MIT Press. [hicks2017programmed]

    Hill Collins, P. (2002).Black feminist thought: Knowledge, consciousness, and the politics of empowerment (Rev. 10th anniversary ed). Routledge. [collins2002black]

    Hunt, A., & Thomas, D. (1999).The pragmatic programmer: From journeyman to master. Addison-Wesley Professional. [hunt1999pragmatic]

    Introna, L. D. (2016). Algorithms, governance, and governmentality.Science, Technology, & Human Values,41 (1), 17–49. https://doi.org/10.1177/0162243915587360 [introna2016algorithms]

    Kleppmann, M. (2017).Designing data-intensive applications : The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media. [kleppmann2017designing]

    Kotamraju, N. P. (2002). Keeping up: Web design skill and the reinvented worker.Information, Communication & Society,5 (1), 1–26. https://doi.org/10.1080/13691180110117631 [kotamraju2002keeping]

    Lave, J., & Wenger, E. (1991).Situated learning: Legitimate peripheral participation. Cambridge university press. https://www.cambridge.org/highereducation/books/situated-learning/6915ABD21C8E4619F750A4D4ACA616CD#overview [lave1991situated]

    Leonardi, P. M. (2011). When flexible routines meet flexible technologies: Affordance, constraint, and the imbrication of human and material agencies.MIS Quarterly,35 (1), 147–167. http://www.jstor.org/stable/23043493 [leonardi2011flexible]

    Lofland, J., & Lofland, L. H. (1995).Analyzing social settings. Wadsworth Publishing Company. [lofland1995analyzing]

    Lurie, E., & Mulligan, D. K. (2021).Searching for representation: A sociotechnical audit of googling for members of U.S. Congress (Working Paper). https://emmalurie.github.io/docs/preprint-searching.pdf [lurie2021searching_draft]

    Marcus, G. E. (1995). Ethnography in/of the world system: The emergence of multi-sited ethnography.Annual Review of Anthropology,24, 95–117. http://www.jstor.org/stable/2155931 [marcus1995ethnography]

    Mathew, A. J., & Cheshire, C. (2018). A fragmented whole: Cooperation and learning in the practice of information security.Center for Long-Term Cybersecurity, UC Berkeley. [mathew2018fragmented]

    Mills, C. (2008). White ignorance. In L. S. Robert Proctor (Ed.),Agnotology: The making and unmaking of ignorance. Stanford University Press. [mills2008white]

    Misa, editor, Thomas J. (2010).Gender codes: Why women are leaving computing (1st ed.). Wiley-IEEE Computer Society Press. libgen.li/file.php?md5=e4a6849b080f4a078c79777d347726c2 [misa2010gender]

    Mulligan, D. K., & Griffin, D. (2018). Rescripting search to respect the right to truth.The Georgetown Law Technology Review,2 (2), 557–584. https://georgetownlawtechreview.org/rescripting-search-to-respect-the-right-to-truth/GLTR-07-2018/ [mulligan2018rescripting]

    Mulligan, D. K., & Nissenbaum, H. (2020).The concept of handoff as a model for ethical analysis and design. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190067397.013.15 [mulligan2020concept]

    Noble, S. U. (2018).Algorithms of oppression how search engines reinforce racism. New York University Press. https://nyupress.org/9781479837243/algorithms-of-oppression/ [noble2018algorithms]

    Ochigame, R. (2020). Informatics of the oppressed. InLogic. https://logicmag.io/care/informatics-of-the-oppressed/ [ochigame2020informatics]

    Orlikowski, W. J. (2007). Sociomaterial practices: Exploring technology at work.Organization Studies,28 (9), 1435–1448. https://doi.org/10.1177/0170840607081138 [orlikowski2007sociomaterial]

    Orlikowski, W. J. (1993). CASE tools as organizational change: Investigating incremental and radical changes in systems development.MIS Q.,17 (3), 309–340. https://doi.org/10.2307/249774 [orlikowski1993case]

    Passi, S., & Barocas, S. (2019). Problem formulation and fairness.Proceedings of the Conference on Fairness, Accountability, and Transparency, 39–48. [passi2019problem]

    Passi, S., & Jackson, S. J. (2018). Trust in data science: Collaboration, translation, and accountability in corporate data science projects.Proceedings of the ACM on Human-Computer Interaction,2 (CSCW), 1–28. [passi2018trust]

    Passi, S., & Sengers, P. (2020). Making data science systems work.Big Data & Society,7 (2), 2053951720939605. [passi2020making]

    Prior, L. (2003).Using documents in social research. SAGE Publications Ltd. https://doi.org/10.4135/9780857020222 [prior2003using]

    Pugh, A. J. (2013). What good are interviews for thinking about culture? Demystifying interpretive analysis.Am J Cult Sociol,1 (1), 42–68. https://doi.org/10.1057/ajcs.2012.4 [pugh2013good]

    Seaver, N. (2017). Algorithms as culture: Some tactics for the ethnography of algorithmic systems.Big Data & Society,4 (2), 1–12. https://doi.org/10.1177/2053951717738104 [seaver2017algorithms]

    Seibel, P. (2009).Coders at work: Reflections on the craft of programming. Apress. [seibel2009coders]

    Star, S. L. (1999). The ethnography of infrastructure.American Behavioral Scientist,43 (3), 377–391. [star1999ethnography]

    Sundin, O., Haider, J., Andersson, C., Carlsson, H., & Kjellberg, S. (2017). The search-ification of everyday life and the mundane-ification of search.Journal of Documentation. https://doi.org/10.1108/JD-06-2016-0081 [sundin2017search]

    Sundin, O., Lewandowski, D., & Haider, J. (2021). Whose relevance? Web search engines as multisided relevance machines.Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.24570 [sundin2021relevance]

    Takhteyev, Y. (2007). Jeeks: Developers at the periphery of the software world.Annual Meeting of the American Sociological Association, New York, Ny. https://pdfs.semanticscholar.org/1a98/c87ffd0b07344c5e0aae6e7e498a5c69da00.pdf [takhteyev2007jeeks]

    Takhteyev, Y. (2012).Coding places: Software practice in a south american city. The MIT Press. [takhteyev2012coding]

    Tattersall Wallin, E. (2021). Audiobook routines: Identifying everyday reading by listening practices amongst young adults.JD,78 (7), 266–281. https://doi.org/10.1108/JD-06-2021-0116 [tattersallwallin2021audiobook]

    Tripodi, F. (2022b).The propagandists’ playbook: How conservative elites manipulate search and threaten democracy (Hardcover, p. 288). Yale University Press. https://yalebooks.yale.edu/book/9780300248944/the-propagandists-playbook/ [tripodi2022propagandists]

    Van Couvering, E. J. (2007). Is relevance relevant? Market, science, and war: Discourses of search engine quality.Journal of Computer-Mediated Communication,12 (3), 866–887. http://doi.org/10.1111/j.1083-6101.2007.00354.x [couvering2007relevance]

    Vertesi, J. (2019). From affordances to accomplishments: PowerPoint and excel at NASA. IndigitalSTS (pp. 369–392). Princeton University Press. https://doi.org/10.1515/9780691190600-026 [vertesi2019affordances]

    Wong, R. (2020).Values by design imaginaries: Exploring values work in ux practice [Ph.D. dissertation, University of California, Berkeley]. https://escholarship.org/uc/item/3vt3b1xf [wong2020values]

    Zade, H., Wack, M., Zhang, Y., Starbird, K., Calo, R., Young, J., & West, J. D. (2022). Auditing google’s search headlines as a potential gateway to misleading content.Jots,1 (4). https://doi.org/10.54501/jots.v1i4.72 [zade2022auditing]


    1. Try searching [python “search results” title:google] on Stack Overflow, [python “search results” intitle:google site:stackoverflow.com] on Google, or [python “search results” duckduckgo site:stackoverflow.com] on DuckDuckGo. ↩︎

    2. See the archived repo on GitHub at https://github.com/jarun/googler . ↩︎

    3. For example, Emacs is a text editor initially developed in the 1970s and actively used today. The MELPA repository of Emacs packages includes tools such as fastdef (to “Insert terminology from Google top search results”, google (“Emacs interface to the Google API”, and google-this (“A set of functions and bindings to google under point.” — including “the current error in the compilation buffer”). ↩︎

    4. Extensions are available to make general-purpose web search engines searchable from within Microsoft’s Visual Studio Code IDE. These extensions are built to open search results pages from a host of search engines: Baidu, Bing, Brave, DuckDuckGo, Ecosia, Google, Yandex, and You.com. ↩︎

    5. Dunbar-Hester (2020) notes the importance of remembering at the outset of her research on diversity advocacy in open technology cultures that the sites and groups and their activities she studies are not monolithic (p. 24). ↩︎

    6. While it can also appear shorthanded to the broader “component actors” or subsystem, Mulligan & Nissenbaum (2020) generally use component:

      We use the generic term component to apply to both human and nonhuman parts of the sociotechnical system. While the term component does not naturally apply to human actors, for our purposes it is important to be able to refer in like manner to human and nonhuman components of a system.

      ↩︎
    7. I take care to refer to ‘perceptions’ of affordance, pushing back on a tendency in some analyses to reify affordances as properties of the objects rather than accomplished in contexts and relations (see Leonardi (2011) and Vertesi (2019) for a review of the disjuncture between the development of affordances in psychology and its application in design). Leonardi (2011) draws on the terms “perceptions of constraint”, “perceptions of affordance”, writing that “Technologies have material properties, but those material properties afford different possibilities for action based on the contexts in which they are used.” I do not want to stop at labeling some attribute an affordance, but locate “the networked conditions that make particular use cases possible” ( Vertesi (2019) ). Vertesi argues (p. 388) that:

      The notion that technologies might in and of themselves suggest, prompt, or require different ways of using them from human bodies or interlocutors neglects the richness and complexity that occurs when different groups take up technological tools to achieve local ends

      I refer to standpoint theory (Hill Collins, 2002) , situated knowledges (Haraway, 1988) , and white ignorance (Mills, 2008) when considering how relations and contexts interact with perceptions of constraint or affordance. ↩︎

    8. Goldenfein et al. (2020) provide a tight definition:

      We define “Handoff” as the following: given progressive, or competing, versions of a system (S1, S2) in which a particular system function (F) shifts from one type of actor (A) in S1 to another actor (B) in S2, we say that F was handed off from A to B . [emphases in original]

      ↩︎
    9. Funded by the National Science Foundation under the U.S. NSF INSPIRE SES1537324. ↩︎

    10. This paper was later published as Goldenfein et al. (2020) . ↩︎

    11. These “manual pages” provide access to documentation. An engineer could access the documentation directly on their computer by typing man [command, function, or file name] in their terminal. You also may be able to type man man in your terminal or command-line application see the documentation for man , the “format and display the on-line manual pages”. Or, for example, type man cal to reference documentation for the cal utility which “displays a calendar and the date of Easter.” ↩︎

    12. A conversation with Andrew Reifers on January 1st, 2019. A later conversation with him, June 6th, 2021 helped me consider the pursuit of expertise and quality in different systems or subsystems enrolled in data engineering work. I could draw on research looking at how expertise is performed or quality is produced (or not) in open source and in recruiting, interviewing, code reviews, etc. In my notes at the time I asked: Where are quality and expertise accounted for, maintained, or responsibilized in coder use of web search? It is clear to me now that these subsystems also engage or act on data engineering web search practices. ↩︎

    13. The LPP analytic is introduced more fully in the next chapter. ↩︎

    14. They discuss reasons, particularly in a section titled “With legitimate peripheral participation” (pp. 39-42) for “turning away from schooling” and “school-forged theories” of learning (p. 61). ↩︎

    15. Jean Lave ↩︎

    16. While she does not use the term in some of her subsequent work, (Christin, 2018, 2020a) , Christin expands on and reframes this approach in a later article that presents “studying refraction” as a “practical strateg[y] for ethnographic studies of algorithms in society”, calling it “Algorithmic refraction” (Christin, 2020b) . She notes that it “partly overlaps” with methodological approaches like Barley’s “technology as an occasion for structuring” and Orlikowski’s approach to “sociomaterial practices” (p. 10; internal citations are to Barley (1986) , Bechky (2003) , and Orlikowski (2007) ). She closes out her section on the strategy (p. 11):

      focusing on algorithmic refraction and treating algorithmic tools as prisms that both reflect and reconfigure social dynamics can serve as a useful strategy for ethnographers to bypass algorithmic opacity and tackle the complex chains of human and non-human interventions that together make up algorithmic systems.

      ↩︎
    17. For one cursory example, see Scott Hanselman, not a data engineer, but a developer who works in Microsoft Developer Relations. A decade ago he tweeted a link to a blogpost of his titled: “Am I really a developer or just a good googler?” (Hanselman, 2013) . (This blog post is an anchor for many conversations about searching the web while working in code-related roles, this post is linked to throughout such conversations over the years.) Then, he tweeted more recently (Hanselman, 2020)

      Me: 30 years writing software for money. Also Me: Googles how to hide a div with a querySelector. Hang in there #codenewbies".

      ↩︎
    18. Slack is a workplace messaging platform billed to replace email. Every interviewee, except for those working at or with companies that designed and sold a competitor product, mentioned using Slack in the workplace or for interacting with external collaborators. It became the “professional networking site” for the UC Berkeley School of Information during my time there. ↩︎

    19. Noble (2018, pp. 181–182) :

      Indeed, we can and should imagine search with a variety of other possibilities. [ . . .] Such imaginings are helpful in an effort to denaturalize and reconceptualize [ . . . ]

      ↩︎
    20. Such as the moment an interviewee halted mid-sentence, appeared startled, and then urgently confirmed that I would not name their company. ↩︎

    21. See the work from Passi, based on his ethnographic fieldwork with an industry data science team, and colleagues on the “problems of trust and credibility are negotiated and manage” (Passi & Jackson, 2018, p. 2) , “the everyday practice of problem formulation” (Passi & Barocas, 2019, p. 3) , and “what work the system should do, how the system should work, and how to assess whether the system works” (Passi & Sengers, 2020, p. 2) . ↩︎

    22. Zippia. Data Engineer Demographics and Statistics in the US. https://www.zippia.com/data-engineer-jobs/demographics/ ↩︎

    23. At the time of this writing there are over 4000 members of the Slack workspace, with over 200 in the #data_engineering channel. ↩︎

    24. https://www.reddit.com/r/dataengineering/ ↩︎

    25. My website recruitment pitch is archived at Archive.org: https://web.archive.org/web/20211024103319/https://danielsgriffin.com/currently-recruiting-interview-participants-dissertation.html . ↩︎

    26. I used the VoiceDream Reader on my phone and computer for text-to-speech for listening to articles or books (that were not prepared for audio consumption), as well as drafts of this dissertation. Listening is a form of reading ( Tattersall Wallin (2021) ). In future research I will attempt to get IRB approval for storing interview audio, password protected, on my phone in order to immerse myself in the material in the sort of everyday listening occasions that Tattersall Wallin describes. I will confess that I did also listen to interviews, on cordless headphones played from my laptop, sometimes while playing with our, at the time, 1-year old. ↩︎

    27. Early in this research I wrote a small script that would create a new blank memo file for me to fill at the start of every work day. I thought to set aside time at the start of the day to ensure I did my memoing. My morning memo. Eventually I went back and deleted hundreds of files that I had left empty. ↩︎

    28. Calendly. ↩︎

    29. I copied my early member check invitation emails to my website, linked to on my homepage and from my prior interview recruitment page. It is archived at Archive.org: https://web.archive.org/web/20221125080137/https://danielsgriffin.com/currently-conducting-member-checks.html ↩︎

    30. Here is a mapping from those drafts chapters to the this final version:

      ↩︎
    31. When I applied to work at the library while in high school, the librarian conducting the job interview noted my name and commented, “oh, you’re the one checking out all the science books.” I regularly walked out of the library with a stack of books much larger than I could read. ↩︎

    32. This is the same job that Chelsea Manning had while in the U.S. Army. Her unit replaced mine in Iraq. I talked and walked her, and other soldiers in her unit, through the workflows we had developed, including the search tools we used. ↩︎

    33. In a fashion visible also in jokes about programmers just copy-and-paste code from the internet, people in more specialized roles would regularly joke that the all-source analyst’s job was just to copy-and-paste. ↩︎

    34. I briefly used the Emacs text editor extensively, including writing small List snippets to change the behavior to the tool. ↩︎