Asking [Why hire Daniel?] in an LLM search w/ local Lunr.js

    tags: RAG
    July 12th, 2023

    An example of searching with OpenAI’s GPT models on my local Lunr.js instance.

    I modify an example of retrieval-augmented generation (RAG) from the OpenAI Cookbook to make embeddings for all pages to support ‘semantic’ searching, retrieve pages for the query [Why hire Daniel?], and return a generated text response.

    This morning I briefly explored “Question answering using embeddings-based search” in OpenAI’s cookbook.

    Their notebook demonstrates a search on the News API. I didn’t even try that. I first developed a local API (via Express and node) with Lunr.js and drawing on indices that I’ve already developed for this website. I then replaced their search_news function with my simple search_dsg_lunr function. I also slightly modified the initial prompt for generating reformulated queries (from “You have access to a search API that returns recent news articles.” to “…returns documents from danielsgriffin.com.”). I’m very much just playing around, loosely exploring, and not suggesting this is best practice.

    The modified notebook is at /open_scripts/Question_answering_using_a_search_API-2023-07-12-10_56_01.ipynb.

    high-level overview

    1. a prompt to generate reformulated queries from an initial query from the user (here I used: [Why hire Daniel?]).
    2. collect the search results for running each query on my search API
    3. develops a hypothetical answer for the initial query, without looking at the results or reformulated queries (they cite to Gao et al. (2022))
    4. retrieves embeddings for the hypothetical answer and the search results (I added some functionality here to get hashes of these documents and log the embeddings for future use)
    5. calculates cosine similarities to rerank the collection of results
    6. sends the question and top results (just the title and snippet–from my generated snippets) to the model retrieve an answer.

    the ‘answer’ (with gpt-3.5-turbo)

    Daniel S. Griffin is a Ph.D. holder in Information Science from the University of California, Berkeley, who specializes in web search tools and practices. He is actively seeking opportunities in industry to contribute to better understanding and improving search tools and practices, particularly in generative search and search-like tools. With his background and expertise, Daniel is well-equipped to understand users and effectively communicate findings. If you are interested in connecting with Daniel or have any potential opportunities, you can find his contact information on his Hire me page.: Hire me

    the ‘answer’ (with gpt-4 only for the final step)

    Daniel S. Griffin is a highly qualified professional with a Ph.D. in Information Science from the University of California, Berkeley. He has extensive experience working with various organizations on topics such as web search, algorithmic fairness, and cybersecurity. His expertise lies in understanding users and identifying and communicating findings effectively, which makes him a valuable asset for any organization (About).is a skilled qualitative researcher who has focused on web search tools and practices. His background and expertise enable him to contribute significantly to the understanding and improvement of search tools and practices, particularly in generative search and search-like tools. He is actively seeking opportunities in the industry to leverage his skills and knowledge (Hire me.).has also demonstrated his knowledge through his research. He has explored the solitary and secretive nature of web searching among data engineers, the importance of search repair practices, and the potential for technocratization of search (6. Owning searching, 5. Repairing searching)., he has taught a course on understanding change in web search at Michigan State University, indicating his ability to share his knowledge with others and his commitment to ongoing learning (Repairing Searching).Daniel would mean gaining a team member with a deep understanding of search practices, a commitment to improving these practices, and the ability to effectively communicate his findings.

    References

    Gao, L., Ma, X., Lin, J., & Callan, J. (2022). Precise zero-shot dense retrieval without relevance labels. http://arxiv.org/abs/2212.10496 [gao2022precise]