A small quibble with a tiny piece of Zade et al. (2022):
…Google does not officially support a search API…
While clearly Google’s support is not sufficient for the research above (they used SerpApi)1, there is a very limited API that may be useful for other purposes (see the many comments and concerns on StackOverflow here).
You can create a Programmable Search Engine that searches the entire web. This is not a mirror of the Google SERP or the Google search experience, but it is a web search API.
Attribute | Value |
---|---|
Search engine name | so_does_google_have_an_official_search_api |
Description | Can the Programmable Search Engine’s Custom Search JSON API be said to be an official web search API? |
Public URL | https://cse.google.com/cse?cx=f18b82ccfc1ba4827 |
Augment results | Search the entire web |
Sites to search | You do not have sites to search. |
Sites to exclude | You do not have excluded sites |
This is a functioning interface from Google:
I can then use the Custom Search JSON API.
Custom Search JSON API provides 100 search queries per day for free. If you need more, you may sign up for billing in the API Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day.
If you need more than 10k queries per day and your Programmable Search Engine searches 10 sites or fewer, you may be interested in the Custom Search Site Restricted JSON API, which does not have a daily query limit.
The Programmable Search Element API charges $5 per 1000 ad-free search element queries. Billing needs to be configured in the API Console. Quotas can be configured in the Cloud Platform Console to help limit maximum daily expenditures.
Note that new consumer projects will default to unlimited daily quota; we strongly recommend setting a daily quota that is sufficient for your traffic volume.
Journal: Journal of Online Trust and Safety
Volume: 1 | Issue: 4
Year: 2022
DOI: 10.54501/jots.v1i4.72
The prevalence and spread of online misinformation during the 2020 US presidential election served to perpetuate a false belief in widespread election fraud. Though much research has focused on how social media platforms connected people to election-related rumors and conspiracy theories, less is known about the search engine pathways that linked users to news content with the potential to undermine trust in elections. In this paper, we present novel data related to the content of political headlines during the 2020 US election period. We scraped over 800,000 headlines from Google’s search engine results pages (SERP) in response to 20 election-related keywords—10 general (e.g., “Ballots”) and 10 conspiratorial (e.g., “Voter fraud”)—when searched from 20 cities across 16 states. We present results from qualitative coding of 5,600 headlines focused on the prevalence of delegitimizing information. Our results reveal that videos (as compared to stories, search results, and advertisements) are the most problematic in terms of exposing users to delegitimizing headlines. We also illustrate how headline content varies when searching from a swing state, adopting a conspiratorial search keyword, or reading from media domains with higher political bias. We conclude with policy recommendations on data transparency that allow researchers to continue to monitor search engines during elections.
Here is more context from Zade et al. (2022):
↩︎To answer these questions, we focused on news headlines from Google’s SERP data (see Figure 1). The headline of a news story is known to influence users’ interpretation of the story’s content (Tannenbaum 1953) and impact its popularity (Rieis et al.2015). We collected headlines using election-related search keywords as seen on Google’s search engine across 20 locations spread throughout the US. Since Google does not officially support a search API, and other services do not support location-specific requests, we resorted to a third-party paid service called SerpApi (SerpApi 2020). This service allowed us to perform searches such that the results were associated with the locations of our 20 selected sites, rather than the results that Google would normally associate with the geographic location of our local IP address. Our collection of data began before the election in early October 2020 and ran through mid-December. We performed an extensive qualitative analysis of a random sample of 5,600 headlines from over 500,000 SERP search results, 242,000 SERP stories, 62,000 SERP videos, and 47,000 SERP advertisements to evaluate the potential of SERP data to undermine trust in the election. In addition to the analysis, we make the raw Google SERP data corresponding to election-related keywords across several disparate locations openly available to further analysis by other researchers (Zade, Wack, and Zhang 2022).
[…]
Google does not officially support any search API, and other search services do not allow easy access to location-specific SERP data. While we had access to a white-listed IP address to crawl unlimited Google SERP data, this data would have reflected SERP results as seen from that specific location. To accommodate location as a factor in SERP-related audits, prior research resorted either to using browserbased plugins (Robertson et al. 2018) (limiting the data collection to queries adopted by select users at specific times), or to making data requests from multiple locations with unique IP-addresses (Mustafaraj, Lurie, and Devine 2020) (limiting the scalability to only a few unique locations). To overcome these limitations, we used the SerpApi platform (SerpApi 2020) to search for keywords of our choice mentioned in Table 1 at regular intervals each day and fetched the corresponding Google Search results as it would be seen at the 20 unique locations listed in Table 2.
Zade, H., Wack, M., Zhang, Y., Starbird, K., Calo, R., Young, J., & West, J. D. (2022). Auditing google’s search headlines as a potential gateway to misleading content. Journal of Online Trust and Safety, 1(4). https://doi.org/10.54501/jots.v1i4.72 [zade2022auditing]