“What does ‘no knowledge cutoff’ mean?”

    @danielsgriffin via Twitter on Nov 29, 2023

    What does “no knowledge cutoff” mean?[1]
    ___
    1. Context: generative search systems

    When Perplexity AI announced their new online LLM APIs, Aravind Srinivas (Cofounder and CEO) said it was “the first-ever live LLM APIs that are grounded with web search data and have no knowledge cutoff!” [mark added] Similarly, Denis Yarats (Cofounder and CTO) said these models “have internet access” (which is very distinct from access to search results) and “perform very well on prompts that require factuality and up-to-date information”. The Perplexity AI account described it as “a first-of-its-kind live-LLM API.”

    What does “no knowledge cutoff” mean?

    I wrote in reply:

    The “no knowledge cutoff” suggests to me an expensive ‘live browsing’ of retrieved pages. While the blogpost says “updated on a regular cadence,” there is no mention of live access to pages. Am I reading that correctly?

    I don’t think live access to webpages is necessary for most searches, even for most time-sensitive queries. But I would like to ground our claims about generative search systems.

    The FreshLLMs (Vu et al., 2023) paper ( doi | vu2023freshllms ) uses similar language: “up-to-date” and aims to look at performance on questions demanding “fast-changing knowledge”. But they only rely on search results and not live browsing. In fact, their dataset design “excluded questions whose answers are likely to change more frequently than once per week.”

    More definitional work and standards-setting may appear in the citations within the Time-sensitive QA paragraph in Vu et al. (2023):

    Time-sensitive QA: FRESHQA aligns with a growing body of work on benchmarking LLMS’ temporal reasoning capabilities (Chen et al., 2021b; Zhang & Choi, 2021; Liska et al., 2022; Kasai et al., 2022). Chen et al. (2021b) created TIMEQA by extracting evolving facts from WIKIDATA along with aligned WIKIPEDIA passages to synthesize 20K timestamped question-answer pairs. Zhang & Choi (2021) constructed SITUATEDQA by annotating 9K realistic questions from existing open-domain QA datasets with temporal context (i.e., timestamps). STREAMINGQA (Liska et al., 2022) consists of both LLM-generated and human-written questions (146K total questions) answerable from a corpus of timestamped news articles Also related is the dynamic REALTIMEQA benchmark (Kasai et al., 2022), which evaluates models on a set of 30 multiple-choice questions about new events extracted from news websites. In contrast, FRESHQA contains a fixed set of human written open-ended questions whose answers by nature can change based on new developments in the world and thus offers a complementary generative evaluation of time-sensitive QA.

    References

    Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei, J., Tar, C., Sung, Y.-H., Zhou, D., Le, Q., & Luong, T. (2023). FreshLLMs: Refreshing large language models with search engine augmentation. http://arxiv.org/abs/2310.03214 [vu2023freshllms]