jump to search:/  select search:K  navigate results:/  navigate suggestions:/  close suggestions:esc

    4. Extending searching

    The cover of Daniel M. Russell’s book, The Joy of Search: A Google Insider’s Guide to Going Beyond the Basics, describes him as the “Senior Research Scientist for Search Quality and User Happiness at Google” (Russell, 2019). The marketing materials from MIT Press say that “readers will discover essential tools for effective online searches [emphasis added]” (The MIT Press, 2020). In the book he writes:

    Most of the searches that Google sees in a typical day are fairly straightforward. The goal is clear, and the results are pretty obvious and unambiguous.

    But a significant number of searches are not. Searchers might have a goal in mind, but they can’t figure out how to express it in a way that will give them what they want. Sometimes their query is precise, but they don’t know how to read and interpret the results. It drives me to distraction as a researcher because I know that the searcher is missing just one small but critical piece of information. We try to build as much as we can into the search algorithm, but people still need to know a bit about what the web is, and how search engines crawl, index, and respond to their queries. [emphases added]

    Data engineers are not explicitly taught much about doing web search as data engineers. Maybe extra lessons are unnecessary. Danny Sullivan, Google’s Search Liaison54 has said, quoted in a Google blog post (Kutz, 2022) :

    Today, it feels like people are born knowing how to search. You just type what you want into a magic box, and poof! It delivers results — no classes needed.55

    Are the search confessions and experience searching in school and everyday life enough for them to use search successfully as data engineers? Does engaging in search confessions provide enoughparticipation to support the learning of these practices? Or do they have theessential tools or acritical piece of information that helps them? How is their technical knowledge and knowledge of their craft applied to make search work? Where is that knowledge?

    I interviewed people with hard-won technical knowledge about the design of data pipelines, the characteristics of databases, deterministic and probabilistic algorithms, and insight into the cost and performance tradeoffs in production systems. My interviewees possess deep insight into how information is moved around and processed by computers and who might be said to “highly value information handling” (Teevan et al., 2004). And they use web search extensively. They rely on web search and it seems to perform well for the purpose to which they have put it. I thought I would talk with them and look with them at how they searched to see what lessons we might learn. Particularly, I wanted to look at the role of their knowledge of the mechanisms56 of web search in their search practices.

    But the data engineers didn’t talk about their knowledge of search technology. Moreover, in response to my questions and prompting they did not suggest that technical understanding of search itself played a key role in their searching successes. The practices described by my interviewees did not present as, or appear to be, grounded in or driven by individual knowledge of search mechanisms. There were generally no references to crawling, indexing, ranking, advertising, nor parsing of queries. Search experiences were addressed with the search engine black boxed and the results reified as the results.

    Instead, the data engineers talked about their web searches heavily structured by practices and artifacts. They described various work practices and artifacts (occupational, professional, and technical components of their work) that I argue produce and maintain scaffolding for web searching. I found that the knowledge brought to bear to make their searching work is embedded in their work practices and their work practices are extensions of web search. Here I will use the analytical frames from Handoff and LPP to show web search as extended beyond the moment of searching and into the broader work practices and artifacts. I will show how the knowledge of successful searching is embedded in those practices and artifacts. I argue those practices and artifacts provide the participation necessary for the situated learning of web search.

    I next discuss the sort of knowledge that might be anticipated, before showing what knowledge is in the work practices and artifacts of the data engineers.

    Knowledge to make search work

    How will you look for it, Socrates, when you do not know at all what it is? How will you aim to search for something you do not know at all? If you should meet with it, how will you know that this is the thing that you did not know?57

    Calls for search platform transparency or explainability and digital or search literacy are rarely accompanied by cases demonstrating the benefits for the searcher. The assumption that knowledge is the problem is accepted and left untested.58

    It would be beneficial to examine closely a site, such as that of data engineers, where people search extensively, seemingly effectively, and with technical knowledge of mechanisms similar to those in web search.

    It is important to look at this question because of its place in research and policy debates about the use of new technologies in society and the appropriate role for regulation or design in facilitating effective use of knowledge of those new technologies. Better understanding the role of knowledge of the mechanisms of web search may feed back into we shape our practices and technologies to better achieve our goals.

    To set the stage for the material in this chapter I will first discuss the mechanisms of web search and the role that some researchers have given knowledge of the mechanisms. Then I will provide an overview of work on the use of web search by coders.

    So what about the mechanisms of web search must be understood for ordinary successful use? Introna & Nissenbaum (2000) describe three kinds of mechanisms of web search: market, regulatory, and technical mechanisms. Introna & Nissenbaum (2000) ’s core focus is on the “market mechanism”, yet they broadly write of the “systematic mechanisms that drive search engines” and argue it would be a “bad idea” to “leave the shaping of search mechanisms to the marketplace” (p. 177). They mention regulatory mechanism in a quote from McChesney (1997) on “the notion that the market is the only ‘democratic’ regulatory mechanism”. They also use “mechanisms” in quoting from Raboy (1998). Raboy’s quote presents the internet as new media and a role for public policy to promote a model that with “new mechanisms” might be “aimed at maximizing equitable access to services and the means of communication”. The technical mechanisms in Introna & Nissenbaum’s “brief and selective technical overview” are the “technical mechanisms” of crawling, indexing, ranking, and “human-mediated trading of prominence for a fee” (p. 181).

    In presenting these mechanisms, Introna & Nissenbaum (2000) were not explicitly discussing knowledge required for practical success in searching. Rather, their conclusions focused on policy and values in design directed towards addressing “the evident tendency of many of the leading search engines to give prominence to popular, wealthy, and powerful sites at the expense of others” and how that “undermines [ . . . ] the substantive vision of the Web as an inclusive democratic space” (p. 181). They did seem to indicate, though, a beneficial role for individual awareness of search mechanisms. They noted unfamiliarity and lack of awareness of “the systematic mechanisms that drive search engines”, and wrote “[s]uch awareness, we believe, would make a difference” (p. 177).59 Though they noted it was only a partial response60 , they demanded transparency (p. 181):

    full and truthful disclosure of the underlying rules (or algorithms) governing indexing, searching, and prioritizing, stated in a way that is meaningful to the majority of Web users. [ . . . ] We believe, on the whole, that informing users will be better than the status quo. [ . . . ] Disclosure is a step in the right direction because it would lead to a clearer grasp of what is at stake in selecting among the various search engines, which in turn should help seekers to make informed decisions about which search engines to use and trust. [emphasis added]

    Researchers from web search studies, critical algorithm studies (Mart, 2017) epistemology (Miller & Record, 2013) , and misinformation studies (Tripodi, 2018) have suggested that searchers’ knowledge of the mechanisms of web search may help searchers better achieve their search goals. Distinct from that claim, some of those researchers additionally argue, alongside others, that such knowledge can first be put to use by researchers, regulators, journalists, and librarians and other educators, who may then act in their expert capacities to shape and advance successful web searching. For instance, Pasquale (2015) argues for “qualified transparency”, disclosure and audits by the FTC or another agency (p. 160-164).

    Some of these studies also point to information literacy as an ideal and an intervention in pursuit not of individual search goals evaluated by accuracy or relevance but of societal goals that embrace values such as neutrality or privacy (Dijck, 2010). Noble (2018) ’s work, by contrast, is focused on algorithmic literacy for building alternative search engines (p. 25-26) and argues that reconceptualized (p. 133) or reimagined (p. 180) search engines might advance transparency at their core.61

    Some research appears to bundle individual search success with other goals or responsibilities. Sundin (2020) , for instance, calls for “critical awareness” that supports the user of search to be “an informed and critical citizen,” (pp. 365-376):

    How do you learn to use an information infrastructure such as search engines? Just as with electricity, which you can use by just switching on the lamp, many times you just have to type a word to get a result that is at least good enough. At the same time, the workings of search engines for a large number of sectors have dramatic consequences in society, such as for business, tourism, politics and schools. Again, you do not have to know very much in order to find out something about what you are looking for. But just as with electricity, if you want to be an informed and critical citizen this is not enough. [emphases added]

    Bhatt & MacKenzie (2019) looked at digital literacy, including search engine use, with “a social practice approach to literacy” (p. 304). They start from the position that literacy is “always associated with, and realised through, ‘social practices’ rather than a purely formally-schooled understanding of correct language” (p. 303) and that literacy is “always embedded within social activities, is socially situated, and mediated by material artefacts and networks” (p. 303). Their claim regarding the consequences arising from the lack of knowledge of mechanisms also bundles multiple effects (p. 305):

    Without knowing just how such platforms work, how to make sense of complex algorithms, or that data discrimination is a real social problem, students may not be the autonomous and agential learners and pursuers of knowledge they believe themselves to be."

    Two aspects of the mechanisms of web search that researchers have identified as important are the data collected and the inferences made by the search engine about the purpose of the search and the searchers. Warshaw et al. (2016) , in work performed at Google, find “a substantial gap between what people believe companies are doing with their data, and the current reality of pervasive, automatic algorithms.” Successful use of web search, drawn most broadly, may require an awareness of the data collected and the inferences made about the searchers.

    Some researchers attribute individuals’ acceptance of and persistent belief in propaganda to searchers’ knowledge deficit. This concern is implicit in Tripodi’s writings. Tripodi (2018) describes a pattern in her interviewees that indicated “users do not have a consistent or accurate understanding of the mechanisms by which the [search engine] returns search results” (p. 28). Tripodi (2018) referred particularly to the role of the keywords in the search query (including how others act to spread and write content to match those keywords) and the ranking of results (and the interpretations of ‘top content’ by searchers). While the focus of Tripodi’s book building on her prior research (2022b) is on how propagandists “wield the power of search” (p. 18) to pursue their ends, her description of her respondents’ knowledge implies a key role for knowledge deficits in the process: “few seems to understand how much keywords drive returns” (p. 103), “believed that top returns were more legitimate” (p. 109), and not realizing “returns are rooted in the search engine’s monetary interests” (p. 111).62

    I originally read Tripodi (2018) within the context of a surge of research (from, in part, critical algorithm studies (Seaver, 2019) ) about political demands for transparency, explainability, and contestability in algorithms (Ananny & Crawford, 2018, Burrell, 2016). There was also considerable alarm about, and interest in, “fake news” or propaganda (Jack, 2017) —with “a resurgence of mis/disinfo studies” (Caplan & Bauer, 2022) —alongside arguments about the absence or failure (boyd, 2018) of media/digital/search literacy. Alongside Tripodi’s work there were others looking at the role of search engines in misinformation (Metaxa-Kakavouli & Torres-Echeverry, 2017) and problematic search results (Golebiewski & boyd, 2018). It was an open question whether platform transparency about the mechanisms of search might be needed to help users search more effectively.

    Google’s myth making complicates attempts to understand the mechanisms (Gillespie, 2014). Like many companies (Burrell, 2016) , Google keeps the design and performance of its algorithms opaque or black-boxed (Noble, 2018). As Bilić (2016) argues, “Google employs powerful ideological engineering of neutrality and objectivity in order to keep the full context of its search engine hidden from everyday users.”

    In our paper on Google’s Holocaust problem (2018), Deirdre Mulligan and I repeat the same general theme in our analysis of a disconnect between how Google engineers and managers imagine and see their tool and how it is perceived and used by others. Our contention, though, was not that search users should adopt the perceptions of the search producer or that knowledge was the answer, rather that the Google engineers and managers should recognize and remediate the consequences from those conceptions. The conceptions of the regular users were produced in part, we argued, by Google’s own myth making.

    Researchers have pointed out that not all search users can identify the distinction between paid for results and so-called organic search results (Commission, 2013, Daly & Scardamaglia, 2017, Ofcom, 2022). Some research suggests some people are unaware of the commercial nature of Google Search. Safiya Noble has shared about talking with a librarian who had been under the impression that Google was a nonprofit organization (House of Lords, Select Committee on Democracy and Digital Technologies, 2020).

    The opacity or obfuscation from the company only means that the users may only know a part of the algorithm, though that may be the more important part for some uses. Users develop practical knowledge of algorithms, “knowing how to accomplish X, Y, or Z within algorithmically mediated spaces” (Cotter, 2022) , and may understand algorithms in ways inaccessible to the designers of the systems (Cotter, 2021) as “[n]ot even people on the ‘‘inside’’ know everything that is going on” (Seaver, 2019).

    use of web search by coders

    Early in my exploration of search and then coders use of web search, even before I narrowed to data engineers, it became clear to me that some of the concerns around getting ‘bad information’ from web searches may be obviated by the nature of their work. Prior work on the use of web search by coders shaped my attention to these concerns.

    Unlike searching in many domains, the coders would often, as a part of their work practice and an affordance of their work, quickly validate what they found in the search results and see for themselves. This validation step, in a limited way, provides some friction that may help coders be reliant on web search while not overly reliant on the search engine rankings.63 They could make a change to their code and then test it by trying to run their code to see if it ‘worked’. This wasn’t some special skeptical bent, but part of their work to get things working. Searches were only complete if they found some workable results.

    This (sometimes and relative) capacity for and practice of testing the results (in the standard use of the search tool) is described in the literature.

    Brandt et al. (2009) describes some of this where participants in their lab study—tasked with building an online chat room— would search for a tutorial and, finding one, “would often immediately begin experimenting with its code samples”.

    One participant explained, “there’s some stuff in [this code] that I don’t really know what it’s doing, but I’ll just try it and see what happens.” He copied four lines into his project, immediately removed two of the four, changed variable names and values, and tested. (p. 1592)

    But they also described such testing as not always happening immediately. Saying “Participants typically trusted code found on the Web” and that errors or misapplication of the code found on the web was then not noticed immediately, which complicated the coders’ remediation efforts. (p. 1593)64

    In other domains the testing or trying of search results is done at first by the searcher in interaction with the various results and their presentation. If one wasn’t sure or wanted to know more the quickest path is a further web search (or reading and browsing further on in the search results pages or on the SERP—there are tools built into major search engines, like Google’s three-dots, that might provide some transparency about a page).

    But workable in the moment isn’t enough. There is other information that isn’t immediately testable by running it through a compiler or interpreter and seeing if it “works”. This is particularly the case for non-functional properties (normally including things like security, reliability, and scalability but could also include considerations of harm both in the implementation and use of the code (Widder et al., 2022, p. 8)). There is research looking at insecure or unsafe code on Stack Overflow. Fischer et al. (2017) showed that of the 1.3 million Android applications on Google Play 15% “contain vulnerable code snippets that were very likely copied from Stack Overflow” (p. 135). Subsequent work found that, on Stack Overflow questions related to Java security, “On average, insecure answers received more votes, comments, favorites, and views than secure answers” (Chen et al., 2019, p. 545). Firouzi et al. (2020) examined the use of possible unsafe unsafe keyword in C# code snippets on Stack Overflow.

    Seeds for queries and spaces for evaluation

    There are many mechanisms of web search at various layers and levels of interaction with the work of data engineers. Here I focus on the role of knowledge of such mechanisms as the mechanisms relate to two aspects of data engineering web search activity: how they choose search queries and how they evaluate search results.

    I asked Shawn, a Developer Advocate in open source data software and former data engineer, to talk about the thinking that went into writing error codes. He talked about gradient descent and the error message’s role.

    You can definitely search yourself into a— gradient descent , you’ve heard of that algorithm, in machine learning?— You can always find yourself at a very local minimum or maximum and actually you’re not even close to the solution. It is very much a gradient descent type of problem. ‘Oh, this looks like the right solution.’ This also comes with experience too, right? Having that context of knowing that this is not the correct solution even by just a weird gut feeling. So it is interesting how Google’s algorithms can only take an input and that input is very much shaped by the experience of the engineer on the other side. And the way that they take in the outputs of the result of that search will also take them in one direction or the other. To some extent being able to have the right context, the right understanding it, always comes down to context.

    The error message provides some context for the unknown engineer who may find an exception when running your code. This context shapes the situation, partially situates, even, the data engineer as they work to search for solutions to their problem. The error message is an example of a search seed and a space for evaluation, the next two sections.

    Query formulation

    Jillian was a new data engineer on a small data engineering team at a fitness app. Talking about search struggles, she shared the following:

    Every so often I have such a hard time phrasing what I’m trying to look for. I continually am searching something new and I just aimlessly click through all of the search results. Whether or not I’m even being intentional about reading what the link even says. I’ll click on it, look through and it doesn’t have the answer. It’s always in this, not panicked state, but frustrated state. Where I’m like I have such a simple question.

    And I also feel that its the sort of question where If I could ask someone a question and phrase it, then it could probably be addressed in 15 seconds. But instead, there have probably been times where I have spent two hours trying to look through, because I don’t know how to phrase what I’m trying to do.

    when I can’t find my answer then I’m just like throwing darts. Putting random things in the search line trying to see if I can find what I’m trying to do. [ . . . ] the darts rarely stick. These are the scenarios where I’m really kinda desperate at this point. I’ll sometimes even look at what I’ve searched and I’ll be like ‘like why on earth would that give me the response I’m looking for, I didn’t even include the coding language. What have you done!?’ You’re brain is just like shutting off at this point and you’re in this habit of like click, scan, click, scan… And you just put in random keywords…. that aren’t even the keywords that you necessarily need to get your answer.

    Upon realizing she’s fallen down this hole, Jillian said that she will tell herself “Let’s just pause and think about it.” And write out by hand what she is looking fork, using other knowledge of her craft to remember or reconstruct different search queries.

    Say for example I was trying to understand some transformation I might have to write down a table and kind of look at it.

    Generally, though, the data engineers are not throwing darts or attempting to search floating concepts. Their search queries are often given to them, or seeded, through their work practices.

    I will discuss two aspects of query formulation arising from production and socialization. Data engineers find queries as they areprompted by the code andimmersed in linked conversations. This resembles

    Prompted by the code and “trail indicators”

    Noah, the data engineer at a media streaming company:

    I copy the part of the error message that seems most relevant. There will be a whole stack trace and a bunch of stuff and usually there is one line that its like, ‘here is what went wrong’. So I’ll copy the generalized portion of that. So it might say error in file x.txt and then some error message. So you copy the generalized portion… and just paste that into Google and see what comes up. Often the first result will be a Stack Overflow question or occasionally a GitHub issue.

    And you can go in and see, alright. Was this person trying to do something roughly similar to what I was doing. Which is a little bit error message dependent. Some error messages apply to a million different situations so you have to further find the one that is more similar to your situation. Other error messages only come up when you’re trying to do that specific thing. But, yeah, that’s the process. Copy-paste-search.

    After I prompted Noah with Teevan et al. (2004) ’s distinction between “orienteering” and “teleporting” searches (“directed situated navigation” versus “jump[ing] directly to their information target using keyword search”), he said

    With the error messages, I think a lot of times there is a hope that it is a teleportation search and a fear that it’s not. I can search this error message and if I’m lucky the first result is going to tell me exactly what I need to know. And if not I might have some orienteering to do. Where it is narrowing in on the problem, narrowing in on what the cause is, figuring out of five people getting the same error message which one is doing something resembling what I’m doing.

    While the error messages may be generic, the stack trace or some aspect of the message itself or the top search result might still guide the searcher.

    Raha, a senior data engineer at a media entertainment company, in response to seeing the “Googling the Error Message” book cover (a parody of the O’Reilly books), observed that “sometimes they give you a hint that this is a generic error.”

    Shawn, a former data engineer who is now developer advocate at an enterprise software company that develops tooling for data engineers, referred to the stack trace as usually providing ‘trail indicators’ even if final error message wasn’t itself particularly useful:

    The typical keywords that really resonate in searches are error messages. In particular, it you get some sort of particular string. There’s types of errors that come out. There will be an error message that is a—

    —we’ll go with the most, the least descriptive possible exception that you could think of in Java, in Java land, which is the NullPointerException in Java.

    If I just put [NullPointerException Java] I’m going to basically get searches that take me to how stupid NullPointerException is because it pretty much the thing you run into when you don’t know what is the problem. It is basically when basically you’ve misprogrammed something. It is so undescriptive in the language of Java. It is more of a Java problem, not necessarily a problem that you’re dealing with directly at the cause of your specific issue. You’ve misprogrammed something and you forgot to account for a null and unfortunately the Java language doesn’t account for nulls super well. So like, you’ve come to realize that is not a good thing. You can’t just use this NullPointerException as a way to find the answer to your question. Right? So what you have to do at your next step is, I need to find some other sentence or some other piece on this stack that might…

    So usually you’ll get a stack of different things that have called other things, right. All the way from the root process of whatever started this call on a particular thread. You’ll be able to track and see what called what up until that point. And so usually along the way there are trail indicators of, ‘OK, this is where I was at in this code and this called this piece of this code’. And you can actually look at the source code you know and start trying to understand that. Maybe you don’t even need that. Sometimes when it comes to just, you know,“cheating”

    So that’s typically I would say–and that is not data engineering specific at this point—that’s generic to any developer that has to work with any kind of framework.

    Those frameworks will have certain error messages that are phrased in just the right way that have just the right exception and that is enough.

    Shreyan, a data engineer at an enterprise software company, shared this as his initial response to my asking if he could talk a little bit about he uses search:

    Whenever you have say an error. You copy the error. You either try to copy pieces of it or the complete error. … copy it and paste it into the web

    Patrick, likewise a data engineer at an enterprise software company, said:

    if I get an error message I’ve never seen before

    just copy and paste it straight in

    Arjun, a principal data engineer at an enterprise software company said:

    So one technique I always say is copy the first line of the error, of the stack track, and post that in the google. You’ll get a much better understanding then. So that’s an easy way for you to debug.

    Nisha, the director of Director Of Data Services at an enterprise software company also shared about searching an error:

    Sometimes you are stuck with small things that you need to understand… What’s going wrong? Why is something not working? And then you want to quickly look up an error that you’re seeing on Google and you can get more ideas as to what the possible issues could be.

    Charles, a data scientist on a data engineering team at large online marketplace:

    I think the most common thing, which I know a lot of people do, is I would just copy and paste the error message straight into Google

    Finally, also Phillip, another data engineer working in enterprise software:

    when we encounter certain errors, especially as the tools that we use, a lot of errors that can come up kinda just copy and paste into Google and kinda justhope

    It isn’t only errors that data engineers search. Noah talked about searching the web to find a subsequent term to then search in internal code searches.

    I think it’s kinda a spectrum of how much do you know what you want to do. used internal search to find out how other people might have done the same thing.

    Rather than searching phrases “strategically signaled” in talking points from politicians or found in propaganda ( Tripodi (2018) ), the data engineers are searching phrases drawn directly from their work material. While they do reflect on their query choices, they do not have to bring considerable knowledge of the mechanisms of the search engines to bear. The code socializes the potential search terms for querying and so scaffolds the search process.

    Immersed in linked conversation

    Ajit, the staff data engineer at a major retailer, shared about an informal weekly meeting where he and his team talk about their work:

    In our team, we do weekly catchups. Like stand up meetings. Those are basically around: What are you doing on a day-to-day basis? So mostly people talk about the work that they are doing

    If they have any questions on that, if they just want to explain, ‘hey, this is the approach that I am taking.’… And then it is just an open forum. So we just go around the table. A person just tells about what they are trying to do. Anyone on the team who has sorta worked in it before. Or, anyone who sorta has even no idea about it, can sorta ask questions. ‘This might be able to help you.’ Or: ‘Can you explain how this thing is going to achieve what we are trying to do here as part of this project?’

    The data engineers talked about being immersed in the languages, tools, and problems of their work—engaged in varied conversations with colleagues. “Like a lot of times we’ll just share random blog posts or articles for things that we think are interesting.” (Victor)

    Practices designed to coordinate the engineers also provide introductions or additional exposure to the language of their work. Here is Phillip:

    In a typical demo session, usually demos are before the work is complete. It is pretty much to do some knowledge sharing. Each person on my team is working on something well. Pretty much I would not know anything about what someone on my team is working on… So these demo sessions are pretty much just to share what the team is up to as a whole.

    These conversations, distinct from direct question-and-answer interactions, are linked to their work and to the larger occupational community. Various participants in these conversations previously worked elsewhere and also learned some of what they converse about through searching. There are conversing in a shared language that has been made searchable. Regularly referred to in these conversations are the topics of their work: the code, the business logics, and the work processes and tooling.

    These conversations take different forms. Sometimes it is over lunch or informal meeting, and other times it is through Slack, email, of through the more asynchronous sort left in the code or internal wikis or other knowledge management systems.

    The data engineers often have the exposure to their peers using language for potential use as search queries. While they can look elsewhere for such material, they are not solely responsible, or alone, in identifying terms for their searches. They are regularly, presented with potentially queriable terms through the work talk in their occupational field.

    Search seeds

    I use “search seeds” to refer to the keywords or terms presented to the searcher that the searcher in turn uses in a search query.65 I want to be able to refer to the suggestions or opportunities to search. These are the strings of text someone comes to think to send to the web search engine66 as a search term. Or, these are components in the larger environment (spoken words or printed strings of text) that a potential searcher, situated with appropriate resources, may perceive to afford a successful search. This text may be in a code comment, function or method name, the error message, overheard in workplace conversation, or found on the web.

    I’m using ‘seeds’ to draw attention to stages of web search activity that occur before a query is entered into the search box, the germ of an idea to search. While there are always stages before the activity in the search box, sources of a query are often indefinable, much more diffuse and distant than a single seed. A search seed is not an inkling in the mind of the searcher, but a material sign—a spoken or written phrase—of a possible search67 . But much like affordances are situated and relational (Leonardi, 2011, Vertesi, 2019) , so too are search seeds. In that, a search seed may suggest a query to the prepared searcher. A search seed is not the same as the fully grown formulated search query. But the seed may suggests a full query, or some of the terms. Search seeds are not guarantees of search success for the searcher, success will depend on other components of the search system being configured in a way such that the content for the seed is made, the content is indexed, the query is parsed, and the connection is found by the searcher high enough in the search rankings. Seeds are inscriptions that can be mobilized for search (Latour, 1986).

    Search seeds may or may not be explicitly referred to as things to be searched by the people sharing them or artifacts conveying them. In a piece in Wired (Tripodi, 2019a) , building on her 2018 Data & Society research, Tripodi uses the phrase “strategic keyword signaling” to refer to the practice of distributing soundbites or catchy phrases for audiences to search. I first used ‘seed’ to refer to this activity broadly while describing that article (2019). I propose search seed, and the dissemination of search seeds, to encompass such strategic signalling.68

    There is other work that touches on search seeds. Mike Caulfield, a misinformation researcher at the University of Washington’s Center for an Informed Public, has referred to search suggestions or directives from others that drive searchers to conspiratorial content as the “Google This Ploy” (2019a). Ronald Robertson, a search researcher at the Stanford Internet Observatory, has referred to search suggestions on social media as “search directives” in unpublished work. Seeds may be identified in the search results themselves (as queries are reformulated with reference to found content), the SERP itself (with the “People also ask” and “Related searches” rich-content features on Google or the titles and snippets for results), or in the autocomplete search suggestions when typing a query. The seeds from the autocomplete, like the problematic suggestions identified by Cadwalladr in 2016 (2016a, 2016b).69 Seeds are also shared in conversations about “do[ing] more research”, such as in online groups discussing the vitamin k shot given to newborns, providing [vitamin k shot] as a seed as documented by Renee DiResta, now a misinformation researcher at the Stanford Internet Observatory (2018). President Biden provided an explicit search seed in his tweet telling people to “Google “COVID test near me” to find the nearest site where you can get a test" (2022). These are all examples of search seeds. (Note that those all discuss where search seeds are present, not where they are absent.)

    The search terms used matter. Tripodi (2018, 2019a, 2019b, 2022b) , along with Gillespie (2017) , Golebiewski and boyd (2018, 2019), Caulfield (2019a) , and others, have identified the importance of the interactions before the selection of the search query. Tripodi discusses concerns about how the people providing the search seeds and search engines act and what role greater searcher knowledge of the mechanism may have. Gillespie (2017) presents a case study of strategic competition over the results returned by Google for the term “santorum” and the mediating role that the search engine plays in such contests. After the 2003 campaign to redefine the term, to critique homophobic remarks from former politician Rick Santorum, the search results returned for the query [santorum] on the major US search engines are still markedly different from those for [rick santorum]. Golebiewski and boyd introduce “data void” to describe search terms for which there are problematic content due to the the limited amount of total relevant content. These search terms are sometimes targeted by people acting strategically or they may distribute both the search seeds and the content. Caulfield (2019a) presents an example of a search seed for a data void in a “Google This Ploy” with the suggested searches written on beach balls at a rally, discusses the example as a “data void”, and suggests what educators can teach students to do when searching.70

    I find the knowledge of the mechanisms of search seeds embedded within the work practices of data engineers. The occupational, professional, and technical components in the work of data engineering around the production and socialization of search queries (the terms or keywords to search, or the search seeds) incorporate and embed, or hold, knowledge of the mechanisms of web search.

    Spaces for evaluation

    There are two key spaces for evaluation in the work practices of the data engineers.

    • space for running workable code
    • space for gathering feedback – inc. meetings, prototypes, testing, code review, CI/CD

    Running workable code

    The data engineers do not depend solely on their knowledge of the problem domain and of the mechanisms of web search to evaluate search results. They often will take a portion of code from a web search, or an idea of a possible solution to their problem, and test it quickly on their system.

    Here is Shreyan:

    Iteratively figure out if you can actually get something close to what you require. And trying things out, and once you try one thing out, you try another thing out and you try a third thing out. An iterative process of a solution based upon multiple layers of search.

    [ . . . ] If this is in the ballpark of what I want, I would try it out, of course. Or if I don’t try it out. Can I get a more clearer solution from that? Let’s say I search, OK this is the error. ‘SQL code 350-some random shit’. Then that gives you a Stack Exchange page where somebody is talking about something but not exactly the problem. I would try to see, OK, is this the problem I want? Or I would try out the solution.

    So searching is sometimes faster than trying the solution out. So it would dependent upon which one it is. If I can see, oh, this makes sense, let just try. If it won’t take me too long to run the code, so I’ll just run it like this. Otherwise, my habit is to give it a couple more searches to actually figure things out.

    This is seen in how Shawn, the developer advocate, discussed searching for resolutions of bugs or errors:

    And you can try it out, run a couple quick tests on it and make sure it actually works. And then you can move on with your day.

    Running workable code, or proofs-of-concept, can also be done as a shorthand test of not just whether the found-answer supports quality code, but of the quality of the whole search experience when searching about a particular tool. That is, the findability of answers can gauged by seeing how difficult it is to get a simple proof-of-concept running in that language or software. Raha discussed searching for technology solutions, in a more strategic exploratory search phase, and how she was attuned to the content and community around the tool. It would be concerning, in this exploratory phase, if it wasn’t quickly apparent how to run a tutorial or whether a particular bug was addressed in the tool:

    I think its not the search engine,its just the lack of community at the time

    It is pretty much very easy to do some sort of like POC and test it out and see it’s not working.

    I will present more from Amar regarding proof of concepts and getting feedback from others in the organization in the next subsection, but an initial feedback is whether the code is running: “proof of concepts up and running”. Feedback from a snippet of code running is limited. Shawn:

    What context you’re dealing with and with every search you do is only adding to your context of what you know and understand. It is kinda like an experiment every single time. [ . . . ] [ . . . ] That actually gets you the correct output at small scale but whenever you put it into production…. that’s where the rubber hits the road and you realize that your search solution was actually the incorrect solution even though it gave you the green light at a smaller scale.

    Getting some code running can, and in some workplaces is expected to, include immediate test cases. As Vivek noted:

    If I copy something from Stack Overflow and link that, a lot of times people ask “what are the test cases that it is passing?”" They don’t let it go just because it was from Stack Overflow. So you do code review that part.

    While for Ross, workable meant it wasn’t breaking something (and with some eye on the future, which is addressed further in decoupling performance from search ):

    when you use something new I think, it’s expected here at least that you’re going to do enough testing that it works in the way we are going to use it.

    [ . . . ]

    You try to foresee [tech debt and complexity] as you can, but you really you just want to make sure that it wont break something. It works in what we are doing now, that is the testing that you do. With an eye on future use, and then that’s enough.

    This pattern reveals that data engineers rely on the ability to quickly and iteratively test and validate search results or the results-of-search. Their work practices, organizational setting and the technologies of code itself provide distinct means of evaluating search results. These situated resources, rather than data engineer’s individual knowledge of search mechanisms, bolster and inform the internal expertise of the data engineer.

    Tend to gather feedback

    Amar discussed feedback around proofs of concepts as an opportunity to address how even senior engineers overlook resources and benefit from feedback.

    I think earlier in the quarter my work is mostly about architecting systems, coming up with ideas, coming up with designs, breaking down a complex problem into smaller achievable pieces building proof of concepts to make sure the idea works at a small scale and then coming up with what the actual thing is going to be. And that’s like, that’s a very research heavy phase. Validating ideas and solution.

    Amar said they would “propose a solution with a proper list of pros and cons” and “get external feedback”. In his case this sometimes included subject matter experts, so database administrators within the company with decades of experience to check the nuances of a planned use of a particular tool.

    Later in the interview he said:

    Again, even experienced engineers can overlook resources, so that’s why you have these things. Where anytime you work on certain things you tend to gather feedback. Say I work on aproof of concept

    Devin also talked about feedback earlier in the work cycle:

    For those big projects, currently at [our company] is we do a 20% review. 20% of the work has been done. This how my planning process has been going. This is the discovery I’ve done. And this is what I’m planning on the next step which is going to be development. Let’s bring the whole team together of ten-ish engineers and let’s talk about what I’ve done to get this far, let’s talk about the decisions that I’ve made to determine what I’m going to be doing next and let’s have everybody collectively weigh in on my decisions and potentially agree that things are going well and this is the path we are going to go down or iterate on things we should be doing differently. Usually by the time you get to the 20% review you’ve already done the legwork. You’ve already reached out to people specifically to say ‘this is what I’ve been doing, this is how we are going to solve it’ and so usually you’re not going to find many surprises in those meetings. But it is sorta a gut check so that we can all sit down and agree that we don’t go down this path that we are going to regret.

    And then there are also code reviews, which (earlier in the interview) Devin distinguished from feedback:

    I think feedback is more generally like this is what went well, this is what didn’t go well and this is what we can do differently down the road.

    I think code reviews are very specifically more ‘like this has passed the litmus test, let’s go ahead and accept that and move on’. Or ‘this code hasn’t passed the litmus test, let’s correct it.’ Its very specific.

    And there are also automated tests or checks sometimes built into the code review workflow. So Ross would push code to a fork, “check off that you double checked some things” and then, prior to another engineer reviewing the code, integration, syntax, and styling tests are run. He said, “sometimes you just push it up and see if it passes”.

    Others also talked about presenting prototypes or minimum viable product (MVPs) for feedback. Phillip talked about feedback from the users (in his case, internal customers who the engineers interact with directly) before the formal code review:

    Typically the process is we usually build a prototype, or an MVP, and try to test out the feasibility. If the data is accurate or if it fits the requirements. Then we’ll just re-iterate based on the feedback from customers and stakeholders, which we call user acceptance testing. So it is a lot of re-iteration. And so we finally get to that final result.

    The work practices around acceptance testing, feedback, and code review collectivize or distribute the evaluation of the search results.

    Current data engineering work practices incorporate elements of “decoupling” between provisional evaluations of web search results and key actions, such as deploying code into production. This decoupling is a result of the lack of automation, with the gap71 between written and deployed code ideally providing slack, alternative methods to evaluate, and “buffers and redundancies” that are “designed-in, deliberate” (Perrow, 1984, p. 96).72 This ideally provides time and space for re-evaluation and recovery, if necessary. Data engineers’ work components are configured to support this decoupling by handling exceptions, errors, and other faults by routine. This margin limits the immediate effects that potential issues introduced through web search may have on the products or pipelines built by data engineers. Their systems for mitigating risk include code and processes that envelope deployment and fallback-and-recovery systems.73 The decoupling is inclusive of the above spaces for evaluation, and extends beyond them.

    But even with seeds and running code and feedback there are still ways that poor quality code found or shaped by a date engineer’s web searches can be added to the organizational code base. Structures in the work practices of data engineers designed to address a variety of concerns can be used to address this as well. The work practices of data engineers are directed to anticipate (effectively if not explicitly) failed search evaluations. Ajit, the staff data engineer at a major retailer, talked about introducing version control to his team:

    These are the different changes that happened on this specific piece of code over time to understand why did we make those changes and to also track, hey something happened, say, 1st of October. That’s why version control comes into play. And, oh, we made a code change and that code change sorta messed this up.

    I knew about GitHub. I knew about the basics, like you could fork someone’s repository. But then like pull requests and submitting stuff for review. Comparing… commenting… challenging people: ‘these five lines of code make no sense.’ Now they are a day-to-day practice on my team because it makes it greatly easy for you to do a lot of things.

    Especially when you are onboarding new team members, you don’t have to worry about this person has built something or changed something which just messed everything up. Now you have version control and you have reviews so all of your core production level stuff is never touched without proper reviews.

    While his description of the processes include feedback and review just discussed, the ability to identify mishaps in the code is a place where knowledge of the mechanism is externalized and embedded.

    Handling exceptions by routine is also seen in how the core work aim of some of the data engineers was directed towards measuring this dimensions of quality in the code or platform. Charles discussed his work:

    The main project I was working on was doing a lot of metrics and data pipelines for this one massive experiment experiment we were running. We were making a ton of changes to the product listing page and testing…

    This is the case for Kari’s work as well. Working with the data scientists in her company and stepping in once they have a model they want to put into production:

    My team kinda comes in to figure out: OK, how do we turn this model into an API that is ready for production? How do we integrate that into the business based on the current engineering and product structures that we have setup for that feature? How do we start to scale this and monitor this properly considering the fact that we have client facing traffic now?

    And also with Victor. He discusses “productionalizing”:

    Then we build out that MVP and my team focuses on productionalizing it, making sure it will work without failing, that it is reliable, that we can count on the results, and that we’ve setup the after-the-fact monitoring to make sure it is still up and running and not falling over and it is doing what we said it would do.

    The web search practices are also presented as ways to repair faults or limitations that do make their way into production code. Here is Shawn, again talking about writing the messages in exceptions that engineers will see if that exception is raised:

    The more expressive and descriptive you can be when you’re dealing with tests, when you are throwing exceptions, real-time exceptions, especially ones that are at runtime. You know something is going to go wrong real-time.

    You never know where your exceptions are going to be thrown. And you want to assume it is always going to be in some production state. And that there will be something that will help somebody find this information online or quickly just by reading it understand what the hell’s going on and that they need to react in some way shape or form.

    So having that context given to you so clearly, without having sometimes even having to search it. The ideal state for an exception is not having to be searched, but that is rarely– that is something that is almost—you know depending on the expertise and what the person knows, the framework, and the jargon, and all the other stuff— You can’t make, “OK, to understand this exception, let’s start with ‘what is a variable’” [chuckling] You can’t express everything at the starting point for every person that is going to be possibly be running into that error code. You’re always going to be missing some context and that is where search comes in at some point.

    Amar also shared about a role for search, after architecting the system plan and getting feedback on prototypes, in the designers of the systems learning how to handle exception by routine:

    And towards the other half of the quarter it is mostly about doing things, following best practices and trying to figure out, doing everything by the book, perfecting all aspects of running a production system, that includes: How do you easily deploy things? How do you deploy changes? How do you automate a lot of things? How do you, I guess, plan for failures?

    It’s mostly about, once you have an idea that this will work, how do you make sure this will work 99.9999% of the time?

    So that’s where you’ll spend a lot of time building a system up, but a different side of the system, more on the operation side of things again you need a lot of research in that because there are times where you already have best practices in play but a lot of times things are new, things are something that you haven’t worked with before. Learning how to do certain things, its always something that you have to look it up.

    Overall, pretty involved, you need a lot of web search for that.

    As “things are new”, doing things “by the book”, meant, in part, doing your web searches. He highlights the role of search while also presenting the work involved in implementing a system in such a way that exceptions can be handled by routine.

    The work practices of data engineers are structured or channeled in ways able to avoid or recover from many anticipated and unexpected errors. Together, the socialization of seeds, the feedback running workable code and gathered from various collaborative work practices, structures in the storage and handling of the code and data serve as risk-limiting mechanisms. At least insofar as code quality is concerned, search failures or failures in searching are handled by routine.74

    This capacity to handle such exceptions steps in where knowledge of the mechanism and associated skill in use of it are missing or found wanting.

    These risk-limiting mechanisms, including elements of friction, decouple effective performance in key actions from search automation bias.

    Skitka et al. (2000) , looking at human-machine interactions in aviation, defined automation bias as “the use of automation as a heuristic replacement for vigilant information seeking and processing” [p. 86]. Decoupling limits the automation bias widely found in research on the uses of web search. Some such research focuses on the automaticity in what search results are clicked and attended to. Vaidhyanathan (2011) writes that “our habits (trust, inertia, impatience) keep us from clicking past the first page of search results” [p. 15]. Haider & Sundin (2019) discuss a range of this work, pulling together White (2016) on “position bias” (which he notes is “also referred to as ‘trust’ bias or ‘presentation bias’”) (p. 65); Pan et al. (2007) , Schultheiß et al. (2018) ; and Höchstötter & Lewandowski (2009) , writing (p. 33-34):

    line of research investigates what people choose from the search engine results page – often referred to as SERP – and why they choose as they do. This work convincingly shows that how people choose links is primarily based on where these links are located on the search engine results page.

    Narayanan & De Cremer (2022) review this and other research, writing, “as these various studies show, the average user seems to treat Google with “a default,prima facie trust” [emphasis in Narayanan and de Cremer]" (p. 21).75 I use the term “search automation bias” to describe this. The harm from such bias depends not only on the credence granted higher-ranked results, but also the subject of the results and whether the results, or simply the page titles and snippets, are “likely to mislead” (Lurie & Mulligan, 2021). The risks of harmful effects from search automation bias are likely higher in other areas of searching, particularly when the automation bias is regarding searches for which the search engine returns problematic results, such as search results reproducing representational harms (Noble, 2018).

    Despite their significant domain knowledge data engineers may at times rely on the ranking of search results “as a heuristic replacement for vigilant information seeking and processing” (Skitka et al., 2000, p. 86). The data engineering web search activity is part of the larger configuration of information seeking and processing of the organization, with configurations of organizational components creating separation, or decoupling, between the web searches and the performance of post-search activity. This decoupling is achieved through occupational, professional, and technical components that include and extend past the provisional evaluation in the spaces discussed above, and through the various steps involved in handling exceptions by routine. The components of interest are not linked to web search in a way that forces any particular action, it is not actually automated. Errors introduced in web searching, the effects of which for data engineers are often externalized into code, configuration files, or data infrastructures can perhaps be repaired in data engineering work in a manner that is distinct from wrong or less useful beliefs that individuals might develop from inadequate evaluation of search results in other situations, such as in a medical emergency, while filing out a ballot, or when making a major purchase.76

    Discussion: Missing search knowledge

    The preceding shows the knowledge and practices used to make search work distributed and embedded in the occupational, professional, and technical components of their work practices, as an expert field.77 . In this we can see the searching as extended throughout those more distant practices. It follows that participation in those shared practices may be where data engineers learn the practice of searching as a data engineer and make use of the embedded expertise.

    The finding to be shown in this section is that the occupational, professional, and technical components supply a significant amount of context, or structure, as scaffolding for query selection and search evaluation. I do not examine here the emergence or evolution of these work practices, sociomaterial practices are constantly reproduced and maintained, and there is a “constitutive entanglement” between people and the material ( Orlikowski (2007) , building on Giddens (1991) ’s work on social reproduction, “repetitive activities” and “regularized social practices across time and space”).

    The knowledge used to make search work is distributed in the work practices around the production and socialization of search queries (the terms or keywords to search, or the search seeds) and search evaluation (the manner of evaluating and making-use-of search results). This knowledge is importantly distinct from knowledge of the mechanisms of web search. This knowledge is of other ways of validating search results for their purposes, knowledge very situated in the material resources and expertise of this community. The data engineers did not, in interviews, present themselves as having sophisticated knowledge about the mechanisms of web search in their heads. Or other sophisticated knowledge for searching in their heads. But they do make use of such knowledge through their work and practices.

    We can look at the artifacts and the practices to find the knowledge that seems to be missing (Latour, 1986, 1990, 1992). Existing work identifies knowledge embedded in relationships (Badaracco, 1991, Blackler, 1995) ), distributed throughout social and technical arrangements (Hutchins, 1995) and within “supporting protocols (norms about how and where one uses it[)]” (Gitelman, 2006, p. 5). Knowledge, not directly of the mechanisms but effectively so through referred knowledge of the occupational use of the mechanism, is also embedded in social norms (Feldman & March, 1981) or “genre rules” (Yates & Orlikowski, 1992). Cambrosio et al. (2013) argues that knowledge and know-how “cannot function as”expertise" unless they become part of a network." Expertise isn’t possessed or held by any one searcher, but there is a “a network of expertise composed of other actors, devices and instruments, concepts, and institutional and spatial arrangements, distributed in multiple loci yet assembled into a coherent collective agency.” (Eyal, 2019)

    Lave & Wenger (1991) looked at early work from Hutchins on apprenticeship in shipboard navigation, before publication of Hutchins (1995) , as one of their case studies, writing about the participation available from how the components of the navigation deck were configured [p. 102]:

    Apprentice quartermasters not only have access to the physical activities going on around them and to the tools of the trade; they participate in information flows and conversations, in a context in which they can make sense of what they observe and hear. In focusing on the epistemological role of artifacts in the context of the social organization of knowledge, this notion of transparency constitutes, as it were, the cultural organization of access. As such, it does not apply to technology only, but to all forms of access to practice.

    Knowledge of how to search the web like a data engineer is provided in the extensions of search in the work of data engineers. These extensions span individuals and organizations. The embedded knowledge is in the wide work of data engineering itself. Rahman et al. (2018) open their paper evaluating developers use of general-purpose web search for retrieving code saying, “Search is an integral part of a software development process.” It is that. But I also turn that around, arguing that that data engineering development processes are integral to the data engineers practice of search. Search and the knowledge for it is extended in the various tools the data engineers use, in the tools they build, and in their relations with each other. For the fully participating data engineer, the occupational, professional, and technical components of their work interact to force, incentivize, and constrain work practices to align with with successful uses of web search.

    Conclusions: Search extended and knowledge embedded

    Seeing search as extended and knowledge for effective search embedded in this way of work, rather than in individuals78 , may reorient calls for education, design, or regulation that are premised on transparency or explainability and web search literacy designed to fill individual knowledge deficits. We could develop habits and practices that do not require peering into the opaque and ever-changing systems of web search. In some settings we may be able to make effective use of search tools without transparency into its data sources and algorithms. We may shift focus to understanding how to mobilize and recognize effective search seeds in different domains. Rather than encouraging every searcher to understand the mechanisms, we could focus on developing and calibrating our ability to evaluate, individually and collectively. It may help us work towards identifying in which contexts the suggestion that someone turn to web search may lead to results-of-search of differing quality. Where and with whom is ‘just google it’ helpful or not? Finally, this framing may be used to guide the identification of aspects of the web search infrastructures, topics (or places)79 , or situations of searching that may be reconfigured for more effective web search practices.


    Ananny, M., & Crawford, K. (2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability.New Media & Society,20 (3), 973–989. https://doi.org/10.1177/1461444816676645

    Badaracco, J. (1991).The knowledge link: How firms compete through strategic alliances. Harvard Business Press. https://archive.org/details/knowledgelinkhow0000bada

    Bailey, D. E., & Leonardi, P. M. (2015).Technology choices: Why occupations differ in their embrace of new technology. MIT Press. http://www.jstor.org/stable/j.ctt17kk9d4

    Bhatt, I., & MacKenzie, A. (2019). Just google it! Digital literacy and the epistemology of ignorance.Teaching in Higher Education,24 (3), 302–317. https://doi.org/10.1080/13562517.2018.1547276

    Biden, J. (2022).I know covid testing remains frustrating, but we are making improvements. https://twitter.com/POTUS/status/1478761964327297026 .

    Bilić, P. (2016). Search algorithms, hidden labour and information control.Big Data & Society,3 (1), 205395171665215. https://doi.org/10.1177/2053951716652159

    Blackler, F. (1995). Knowledge, knowledge work and organizations: An overview and interpretation.Organization Studies,16 (6), 1021–1046. https://doi.org/10.1177/017084069501600605

    boyd, danah. (2018). You think you want media literacy… do you? InMedium. https://points.datasociety.net/you-think-you-want-media-literacy-do-you-7cad6af18ec2

    Brandt, J., Guo, P. J., Lewenstein, J., Dontcheva, M., & Klemmer, S. R. (2009). Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code.Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 1589–1598. https://doi.org/10.1145/1518701.1518944

    Burrell, J. (2016). How the machine “thinks”: Understanding opacity in machine learning algorithms.Big Data & Society,3 (1), 2053951715622512. https://doi.org/10.1177/2053951715622512

    Cadwalladr, C. (2016a). Google, democracy and the truth about internet search.The Guardian,4 (12). https://www.theguardian.com/technology/2016/dec/04/google-democracy-truth-internet-search-facebook

    Cadwalladr, C. (2016b). Google is not “just” a platform. It frames, shapes and distorts how we see the world.The Guardian,11 (12). https://www.theguardian.com/commentisfree/2016/dec/11/google-frames-shapes-and-distorts-how-we-see-world

    Cambrosio, A., Limoges, C., & Hoffman, E. (2013). Expertise as a network: A case study of the controversies over the environmental release of genetically engineered organisms. In N. Stehr & R. V. Ericson (Eds.),The culture and power of knowledge (pp. 341–362). De Gruyter. https://doi.org/doi:10.1515/9783110847765.341

    Caplan, R., & Bauer, A. J. (2022).How would one write/call the period immediately after everyone-started-caring-all-at-once-about-disinfo era? [Conversation]. https://twitter.com/robyncaplan/status/1522643551527579648 .

    Caulfield, M. (2019a).Data voids and the google this ploy: Kalergi plan. https://hapgood.us/2019/04/12/data-voids-and-the-google-this-ploy-kalergi-plan/ .

    Chen, M., Fischer, F., Meng, N., Wang, X., & Grossklags, J. (2019). How reliable is the crowdsourced knowledge of security implementation?2019 Ieee/Acm 41st International Conference on Software Engineering (Icse), 536–547. https://doi.org/10.1109/ICSE.2019.00065

    Chen, X., Ye, Z., Xie, X., Liu, Y., Gao, X., Su, W., Zhu, S., Sun, Y., Zhang, M., & Ma, S. (2022). Web search via an efficient and effective brain-machine interface.Proceedings of the Fifteenth Acm International Conference on Web Search and Data Mining, 1569–1572. https://doi.org/10.1145/3488560.3502185

    Christin, A. (2017). Algorithms in practice: Comparing web journalism and criminal justice.Big Data & Society,4 (2), 1–12. https://doi.org/10.1177/2053951717718855

    Collins, H. M., & Evans, R. (2007).Rethinking expertise. University of Chicago Press. https://press.uchicago.edu/ucp/books/book/chicago/R/bo5485769.html

    Cotter, K. (2021). “Shadowbanning is not a thing”: Black box gaslighting and the power to independently know and credibly critique algorithms.Information, Communication & Society,0 (0), 1–18. https://doi.org/10.1080/1369118X.2021.1994624

    Cotter, K. (2022). Practical knowledge of algorithms: The case of BreadTube.New Media & Society, 1–20. https://doi.org/10.1177/14614448221081802

    Daly, A., & Scardamaglia, A. (2017). Profiling the australian google consumer: Implications of search engine practices for consumer law and policy.J Consum Policy,40 (3), 299–320. https://doi.org/10.1007/s10603-017-9349-9

    Dijck, J. van. (2010). Search engines and the production of academic knowledge.International Journal of Cultural Studies,13 (6), 574–592. https://doi.org/10.1177/1367877910376582

    DiResta, R. (2018).The complexity of simply searching for medical advice. https://www.wired.com/story/the-complexity-of-simply-searching-for-medical-advice/

    Eyal, G. (2019).The crisis of expertise. Polity Press. https://www.wiley.com/en-us/The+Crisis+of+Expertise-p-9780745665771

    Feldman, M. S., & March, J. G. (1981). Information in organizations as signal and symbol.Administrative Science Quarterly,26 (2), 171–186. http://www.jstor.org/stable/2392467

    Firouzi, E., Sami, A., Khomh, F., & Uddin, G. (2020). On the use of c# unsafe code context: An empirical study of stack overflow.Proceedings of the 14th Acm / Ieee International Symposium on Empirical Software Engineering and Measurement (Esem). https://doi.org/10.1145/3382494.3422165

    Fischer, F., Böttinger, K., Xiao, H., Stransky, C., Acar, Y., Backes, M., & Fahl, S. (2017). Stack overflow considered harmful? The impact of copy&paste on android application security.2017 Ieee Symposium on Security and Privacy (Sp), 121–136.

    Fourcade, M. (2010).Economists and societies. Princeton.

    Giddens, A. (1991).The consequences of modernity. Polity Press in association with Basil Blackwell, Oxford, UK.

    Gillespie, T. (2014).The relevance of algorithms (T. Gillespie, P. J. Boczkowski, & K. A. Foot, Eds.; pp. 167–193). The MIT Press. https://doi.org/10.7551/mitpress%2F9780262525374.003.0009

    Gillespie, T. (2017). Algorithmically recognizable: Santorum’s google problem, and google’s santorum problem.Information, Communication & Society,20 (1), 63–80. https://doi.org/10.1080/1369118X.2016.1199721

    Gitelman, L. (2006).Always already new: Media, history, and the data of culture. MIT Press. https://direct.mit.edu/books/book/4377/Always-Already-NewMedia-History-and-the-Data-of

    Golebiewski, M., & boyd, danah. (2018). Data voids: Where missing data can easily be exploited.Data & Society. https://datasociety.net/library/data-voids-where-missing-data-can-easily-be-exploited/

    Griffin, D. (2019).When searching we sometimes use keywords that direct us… https://twitter.com/danielsgriffin/status/1183785841732120576 .

    Griffin, D., & Lurie, E. (2022). Search quality complaints and imaginary repair: Control in articulations of google search.New Media & Society,0 (0), 14614448221136505. https://doi.org/10.1177/14614448221136505

    Gunn, H. K., & Lynch, M. P. (2018). Googling. InThe routledge handbook of applied epistemology (pp. 41–53). Routledge. https://doi.org/10.4324/9781315679099-4

    Haider, J., & Sundin, O. (2019).Invisible search and online search engines: The ubiquity of search in everyday life. Routledge. https://doi.org/https://doi.org/10.4324/9780429448546

    House of Lords, Select Committee on Democracy and Digital Technologies. (2020).Corrected oral evidence: Democracy and digital technologies. https://committees.parliament.uk/oralevidence/360/html/ .

    Höchstötter, N., & Lewandowski, D. (2009). What users see – structures in search engine results pages.Information Sciences,179 (12), 1796–1812. https://doi.org/10.1016/j.ins.2009.01.028

    Hutchins, E. (1995).Cognition in the wild. MIT Press.

    Introna, L. D., & Nissenbaum, H. (2000). Shaping the web: Why the politics of search engines matters.The Information Society,16 (3), 169–185. https://doi.org/10.1080/01972240050133634

    Jack, C. (2017). Lexicon of lies: Terms for problematic information.Data & Society,3, 22. https://datasociety.net/output/lexicon-of-lies/

    Karapapa, S., & Borghi, M. (2015). Search engine liability for autocomplete suggestions: Personality, privacy and the power of the algorithm.Int J Law Info Tech,23 (3), 261–289. https://doi.org/10.1093/ijlit/eav009

    Kluttz, D. N., & Mulligan, D. K. (2019).Automated decision support technologies and the legal profession. https://doi.org/10.15779/Z38154DP7K

    Kutz, J. (2022).Our search liaison on 25 years of keeping up with search. https://blog.google/products/search/danny-25-years-of-search/ ; Google.

    Latour, B. (1986). Visualization and cognition.Knowledge and Society,6 (6), 1–40.

    Latour, B. (1990). Drawing things together. In M. Lynch & S. Woolgar (Eds.),Representation in scientific practice. MIT Press.

    Latour, B. (1992). Where are the missing masses? The sociology of a few mundane artifacts. In W. E. Bijker & J. Law (Eds.),Shaping technology/building society: Studies in sociotechnical change (pp. 225–228). MIT Press.

    Lave, J., & Wenger, E. (1991).Situated learning: Legitimate peripheral participation. Cambridge university press. https://www.cambridge.org/highereducation/books/situated-learning/6915ABD21C8E4619F750A4D4ACA616CD#overview

    Leonardi, P. M. (2011). When flexible routines meet flexible technologies: Affordance, constraint, and the imbrication of human and material agencies.MIS Quarterly,35 (1), 147–167. http://www.jstor.org/stable/23043493

    Lurie, E., & Mulligan, D. K. (2021).Searching for representation: A sociotechnical audit of googling for members of U.S. Congress (Working Paper). https://emmalurie.github.io/docs/preprint-searching.pdf

    Mart, S. N. (2017). The algorithm as a human artifact: Implications for legal [re] search.Law Libr. J.,109, 387. https://scholar.law.colorado.edu/articles/755/

    McChesney, R. W. (1997).Corporate media and the threat to democracy. Seven Stories Press.

    Meisner, C., Duffy, B. E., & Ziewitz, M. (2022). The labor of search engine evaluation: Making algorithms more human or humans more algorithmic?New Media & Society,0 (0), 14614448211063860. https://doi.org/10.1177/14614448211063860

    Metaxa-Kakavouli, D., & Torres-Echeverry. (2017).Google’s role in spreading fake news and misinformation. Stanford University Law & Policy Lab. https://www-cdn.law.stanford.edu/wp-content/uploads/2017/10/Fake-News-Misinformation-FINAL-PDF.pdf

    Meyer, J. W., & Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony.American Journal of Sociology,83 (2), 340–363.

    Meyer, J. W., & Rowan, B. (1978). The structure of educational organizations. In M. W. Meyer & Associates (Eds.),Organizations and environments (pp. 78–109). Jossey Bass.

    Miller, B., & Record, I. (2013). JUSTIFIED belief in a digital age: ON the epistemic implications of secret internet technologies.Episteme,10 (2), 117–134. https://doi.org/10.1017/epi.2013.11

    Mustafaraj, E., Lurie, E., & Devine, C. (2020). The case for voter-centered audits of search engines during political elections.FAT* ’20.

    Nagaraj, A. (2021). Information seeding and knowledge production in online communities: Evidence from openstreetmap.Management Science,67 (8), 4908–4934. https://doi.org/10.1287/mnsc.2020.3764

    Narayanan, D., & De Cremer, D. (2022). “Google told me so!” On the bent testimony of search engine algorithms.Philos. Technol.,35 (2), E4512. https://doi.org/10.1007/s13347-022-00521-7

    Noble, S. U. (2018).Algorithms of oppression how search engines reinforce racism. New York University Press. https://nyupress.org/9781479837243/algorithms-of-oppression/

    Ofcom. (2022).Children and parents: Media use and attitudes report 2022. https://www.ofcom.org.uk/__data/assets/pdf_file/0024/234609/childrens-media-use-and-attitudes-report-2022.pdf .

    Orlikowski, W. J. (2007). Sociomaterial practices: Exploring technology at work.Organization Studies,28 (9), 1435–1448. https://doi.org/10.1177/0170840607081138

    Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance.Journal of Computer-Mediated Communication,12 (3), 801–823. https://doi.org/10.1111/j.1083-6101.2007.00351.x

    Pasquale, F. (2015).The black box society. Harvard University Press.

    Perrow, C. (1984).Normal accidents: Living with high-risk technologies. Basic Books.

    Plato. (2002).Plato: Five dialogues: Euthyphro, Apology, Crito, Meno, Phaedo (J. M. Cooper, Ed.; G. M. A. Grube, Trans.; 2nd ed.). Hackett.

    Raboy, M. (1998). Global communication policy and human rights. InA communications cornucopia: Markle foundation essays on information policy (pp. 218–242). Brookings Institution Press.

    Rahman, M. M., Barson, J., Paul, S., Kayani, J., Lois, F. A., Quezada, S. F., Parnin, C., Stolee, K. T., & Ray, B. (2018). Evaluating how developers use general-purpose web-search for code retrieval.Proceedings of the 15th International Conference on Mining Software Repositories, 465–475. https://doi.org/10.1145/3196398.3196425

    Rieder, B., & Hofmann, J. (2020). Towards platform observability.Internet Policy Review,9 (4), 1–28. https://doi.org/https://doi.org/10.14763/2020.4.1535

    Russell, D. M. (2019).The joy of search: A google insider’s guide to going beyond the basics. The MIT Press.

    Schultheiß, S., Sünkler, S., & Lewandowski, D. (2018). We still trust google, but less than 10 years ago: An eye-tracking study.Information Research,23 (3). http://informationr.net/ir/23-3/paper799.html

    Seaver, N. (2019). Knowing algorithms.Digital STS, 412–422.

    Skitka, L. J., Mosier, K. L., Burdick, M., & Rosenblatt, B. (2000). Automation bias and errors: Are crews better than individuals?The International Journal of Aviation Psychology,10, 85–97.

    Sundin, O. (2020). Where is search in information literacy? A theoretical note on infrastructure and community of practice. InSustainable digital communities (pp. 373–379). Springer International Publishing. https://doi.org/10.1007/978-3-030-43687-2_29

    Teevan, J., Alvarado, C., Ackerman, M. S., & Karger, D. R. (2004). The perfect search engine is not enough: A study of orienteering behavior in directed search.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 415–422.

    The MIT Press. (2020).Author talk: The Joy of Search by Daniel M. Russell. https://mitpress.mit.edu/blog/author-talk-the-joy-of-search-by-daniel-m-russell/ .

    Tripodi, F. (2018). Searching for alternative facts: Analyzing scriptural inference in conservative news practices.Data & Society. https://datasociety.net/output/searching-for-alternative-facts/

    Tripodi, F. (2019a). Devin nunes and the power of keyword signaling. InWired. https://www.wired.com/story/devin-nunes-and-the-dark-power-of-keyword-signaling/ .

    Tripodi, F. (2019b).SenateHearing + written testimony. https://www.judiciary.senate.gov/imo/media/doc/Tripodi%20Testimony.pdf ; Senate Judiciary Committee Subcommittee On The Constitution.

    Tripodi, F. (2022a).Step five: Set the traps - misinformation that fails to conform to dominant search engine rules is functionally invisible. https://twitter.com/ftripodi/status/1520078674417967105 .

    Tripodi, F. (2022b).The propagandists’ playbook: How conservative elites manipulate search and threaten democracy (Hardcover, p. 288). Yale University Press. https://yalebooks.yale.edu/book/9780300248944/the-propagandists-playbook/

    Vaidhyanathan, S. (2011).The googlization of everything:(And why we should worry). Univ of California Press. https://doi.org/10.1525/9780520948693

    Vertesi, J. (2019). From affordances to accomplishments: PowerPoint and excel at NASA. IndigitalSTS (pp. 369–392). Princeton University Press. https://doi.org/10.1515/9780691190600-026

    Warshaw, J., Taft, N., & Woodruff, A. (2016). Intuitions, analytics, and killing ants: Inference literacy of high school-educated adults in the US.Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), 271–285. https://www.usenix.org/conference/soups2016/technical-sessions/presentation/warshaw

    White, R. W. (2016).Interactions with search systems. Cambridge University Press. https://doi.org/10.1017/CBO9781139525305

    Widder, D. G., Nafus, D., Dabbish, L., & Herbsleb, J. D. (2022, June). Limits and possibilities for “ethical AI” in open source: A study of deepfakes.Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. https://davidwidder.me/files/widder-ossdeepfakes-facct22.pdf

    Wineburg, S. (2021).“Problem 2: Typing “the claim into a search engine” will". https://twitter.com/samwineburg/status/1465542166764081157 .

    Yates, J., & Orlikowski, W. J. (1992). Genres of organizational communication: A structurational approach to studying communication and media.The Academy of Management Review,17 (2), 299–326. http://www.jstor.org/stable/258774

    1. The functions of the Google Search Liaison is examined in Griffin & Lurie (2022) . ↩︎

    2. This quote is also in speculating on how junior engineers learned to search ↩︎

    3. The next section will address this more fully, but the language of ‘mechanisms’ is drawn from Introna & Nissenbaum (2000) & Tripodi (2018) . By the end of this chapter I will discuss ‘mechanisms’ expansively, but at the start I’ll use it in a largely technical sense, as a placeholder for the subject of various transparency and literacy concerns. ↩︎

    4. Plato (2002) ↩︎

    5. Compare arguments about the governance benefit from transparency or observability (Ananny & Crawford, 2018, Rieder & Hofmann, 2020) , and the limits of transparency (Ananny & Crawford, 2018, Burrell, 2016) . ↩︎

    6. Here is the containing text for the quoted material, from Introna & Nissenbaum (2000, p. 177) :

      Given the vastness of the Web, the close guarding of algorithms, and the abstruseness of the technology to most users, it should come as no surprise that seekers are unfamiliar, even unaware, of the systematic mechanisms that drive search engines. Such awareness, we believe, would make a difference.

    7. Additional policies (2000) suggested considering were “public support for developing more egalitarian and inclusive search mechanisms and for research into search and meta-search technologies that would increase transparency and access”, noting that the market, even with disclosure requirements, was not sufficient on its own. They also called for search technology design and research that was directed by “an explicit commitment to values” [p. 182]. ↩︎

    8. Noble does not present transparency as a solution for the search problems she details (except insofar as algorithmic literacy informs the development of alternative search engines), saying instead (p. 179):

      What is needed is a decoupling of advertising and commercial interests from the ability to access high-quality information on the Internet[.]

    9. While Tripodi weaves those observations into her broader analysis, her contribution is centered on the activities of propagandists and how they leverage cultural frames to manipulate searchers and media. The final sentences of her book are “By exposing the schemes behind the propaganda, my hope is that coders and information seekers alike will see the light. For we all need to advocate for greater transparency and verification in the sources we use to learn about our culture, our political candidates, and our world.” (p. 215) ↩︎

    10. Such over-reliance on the ranking of results by search engines may be analyzed as a form of “automation bias”: “the use of automation as a heuristic replacement for vigilant information seeking and processing” (Skitka et al., 2000, p. 86) . This is addressed in a few sections in Decoupling performance from search . ↩︎

    11. Contra the Fischer et al. (2017) findings, in their lab study Brandt et al. (2009) wrote: “Participants typically trusted code found on the Web, and indeed, it was typically correct.” The correctness of the code, though, was limited to it being workable—it would run. Brandt et al. (2009) made no mention of testing the code for security, reliability, scalability of other factors of code quality. ↩︎

    12. “Seed” in some web search research is used to refer to the terms used by the researchers as they collect the suggested or alternative queries provided by the search engine (Mustafaraj et al., 2020) . Seed in that case is used more akin to seeding the algorithm perhaps as one provides a seed for a random number generator. ↩︎

    13. These strings of text are often sent to the web search engine through a search bar, but they can also be sent directly, on many web search engines, through the URL. Some of my interviewees did mention using links to directly go to the search results page for a particular query. ↩︎

    14. While search engines are designed around digitized text, there are other modalities available for search, all materially bound. Voice-based search transformed audio into text. Some people may recognize digital images as search seeds, with reverse image search. Some search engines and other search tools also support searching from a photograph. There is also some support for searching with music or even humming. Chen et al. (2022) demonstrate search queries from electroencephalogram (EEG) signals. As web search expands in these directions it may be necessary pursue new approaches to showing which pictures, sounds, smells, or thoughts might effectively link questions and answers. ↩︎

    15. It has also been used by Sam Wineburg, an education professor at Stanford, in a 2021 tweet referencing Tripodi’s 2019 Wired article: “Typing “the claim into a search engine” will often lead you to exactly where the rouges want you to go. Bad actors “seed” the words they want you to search for & then populate the Web with content supporting their view" (Wineburg, 2021) . Tripodi, rather, has used “seed” in a tweet to refer to the seeding of content: “Propagandists seed the internet with problematic content and manipulate Search Engine Optimization to ensure their content dominates top returns” (Tripodi, 2022a) In her book she sources “seeding” to digital marketing, writing (Tripodi, 2022b, pp. 127–128) :

      prominent personalities within the right-wing information ecosystem understand the media technology du jour, and use that medium to cross-promote their ideas and serve as guests on one another’s shows. Conservative thought leaders also signal-boost specific keywords and phrases in their ideological dialect to ensure that their message dominates users’ search results. Digital marketers call this process “seeding”—distributing content across the web to increase brand awareness and turn viewers into customers.

      She uses a variation of the term (seed, seeded, or seeding) six additional times, each time in reference to content being seeded (p. 134, 140, 180, 183, 207, and 215).

      This “content seeding” related to the “information seeding” done to encourage the development of online communities (Nagaraj, 2021) . ↩︎

    16. See Karapapa & Borghi (2015) for a discussion of search engine liability (in Europe) for search autocomplete suggestions. ↩︎

    17. Caulfield does not suggest educators teach students the mechanisms of search engines, but to do three things which he expands on: 1. Choose your search terms well (“First, let students know that all search terms should be carefully chosen, and ideally formed using terms associated with the quality and objectivity of the information you want back.”). 2. Search for yourself (“There’s nothing more ridiculous than a person talking about thinking for themselves while searching on terms they were told to search on by random white supremacists or flat-earthers on the internet.” He also suggests that educators tell students to “avoid auto-complete in searches unless it truly is what you were just about to type”.) 3. Anticipate what sorts of sources might be in a good search — and notice if they don’t show up (“Before going down the search result rabbit hole, ask yourself what sort of sources would be the most authoritative to you on those issues and look to see if those sorts of pages show up in the results.”) ↩︎

    18. While I rely on the term “decoupling”, it is related to the “wide gaps” between different technologies discussed in Bailey & Leonardi (2015) . While they found efforts to automate across gaps between technologies in hardware engineering, they did not find that in structural engineers. Instead they argue (p. 123):

      Because senior engineers in particular viewed the navigation of wide gaps as beneficial for the cultivation of testing acumen and prudent in the face of liability concerns and government regulations, there was little impetus to hasten automation by limiting the number of technologies that lined each gap in structural engineering

    19. Though I am drawing on similar imagery and so their use is somewhat related, my sense of decoupling is distinct from the “decoupling” in Meyer & Rowan (1977) (which is actually much more related to the absence of technocratization of web search, discussed in Owning searching ). My use is also related to the discussion of tight and loose coupling in Perrow (1984) , although not directly drawn from that and describes instead a situation at the point of deployment where the data engineering organization can draw on tendencies of both tight and loose coupling. Perrow does cite to Meyer & Rowan (1978) (a later piece), building on their description of loose coupling. See also decoupling in Christin (2017) , drawing on Meyer & Rowan (1977) . ↩︎

    20. See also the discussion above regarding liability pressures in Admitting searching : Searching for opportunities . ↩︎

    21. There are other effects from search failures or failures in searching that are not handled within these structures. There are related to interpersonal relations, status, and the associated affective experiences of search and search failures (to be discussed in subsequent chapters). ↩︎

    22. The internal citation is to Gunn & Lynch (2018) , who, perhaps more accurately, write of “most people” rather than “the average user” and use the word “googling” rather than “Google”: “most people treat googling with default, prima facie trust” [p. 42]. ↩︎

    23. Google’s Search Quality Evaluator Guidelines for instance, notes that some search topics have “a high risk of harm because content about these topics could significantly impact the health, financial stability, or safety of people, or the welfare or well-being of society” and that pages on such topics “require the most scrutiny for Page Quality rating” (Google, 2022) . They refer to these topics as “Your Money or Your Life” topics, using also the acronym YMYL. For more on the guidelines and Google’s contractor evaluators, see Bilić (2016) and Meisner et al. (2022) , respectively. ↩︎

    24. Christin (2017) uses the phrases “expert fields” to describe “configurations of actors and institutions sharing a belief in the legitimacy of specific forms of knowledge as a basis for intervention in public affairs”, building on Collins & Evans (2007) & Fourcade (2010) . See also Kluttz & Mulligan (2019, pp. 860–861Fn29) . ↩︎

    25. Though the individuals do have considerable domain expertise—knowledge of terms to search, of the qualities of the various resources on their topics of work, and of how to navigate the wide work practices of fellow data engineers so as to maintain such knowledge. Individuals may take, or translate, their knowledge of their way of work to other places of work with similar arrangements. But their knowledge is not directly about searching the web. They may not so effectively search about some topics if they do not have knowledge (or access to communities to learn it) of how that topic area structures its knowledge of web search (here, how search queries are seeded and search results evaluated) or in places where such seeds and evaluation are not embedded in the way of work. ↩︎

    26. From the Greek, topos for a place. ↩︎