A brief doc on retrieval-augmented generation (RAG)
Retrieval-augmented generation (RAG) is framework for enhance the quality of LLM-generated responses by adding information to the search prompt to improve the generation.
A default search prompt like [Why hire Daniel?] would return generated responses based on the model’s original training. RAG allows you to embed new information (retrieved from some other system components) into a prompt template to provide more context for the model.
Lewis et al. (2020) is credited with developing the framework.
Wikipedia’s page on prompt engineering has a section on RAG:
Prompts often contain a few examples (thus “few-shot”). Examples can be automatically retrieved from a database with document retrieval, sometimes using a vector database. Given a query, a document retriever is called to retrieve the most relevant (usually measured by first encoding the query and the documents into vectors, then finding the documents with vectors closest in Euclidean norm to the query vector). The LLM then generates an output based on both the query and the retrieved documents.[48]
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems. [lewis2020retrieval]