2024-02-05
Finally getting around to adding some notes...
Trieve (open source)
a Trieve blogpost: Why search before generate?
- "intermediary search prompt"
- You can see queries visibly generated by various systems, including ChatGPT, Perplexity, and You.com. But you cannot in the moment rewrite them as a user or stop one of them. It also does not show how they are managed internally in the retrieval and ranking systems.
- for Google's Bard, see this: "you can see the search query it’s grounded in by pressing the G button"
- You can see queries visibly generated by various systems, including ChatGPT, Perplexity, and You.com. But you cannot in the moment rewrite them as a user or stop one of them. It also does not show how they are managed internally in the retrieval and ranking systems.
- "being able to control the search process yourself"
- Ex. Ask Pandi (beta) asks the user to choose to include results or not in the answer (something like the "the generate off chunks route" mentioned in the Trieve post). I often wish that You.com's Research mode let me remove some of the queries and results it says it is going through while generating. (I've also mused on the search systems not simply rewriting the response, but rewriting after I remove a source, give a key 'grounding' detail, or let me edit the intro. (You can edit prior responses in the Open AI playground (not search-enabled), but I haven't seen that in other systems.)) Involving the user more before generation will have some time tradeoffs for the user and some risk of increased computational cost for the providers.
Seems somewhat testable! Randomly assign users (or queries, within subjects) to search-chunk-generate or just generate and look for things like time, computation, and accuracy trade-offs. But these also depend on user skills/practices and prompting/querying might shift. And these depend on and can be compared to changes in the query-rewriting that the system does.
-
"the context window would be far too polluted with extraneous information for it to generate a good, focused answer"
- This reminds me of recent research from
-
See also
- “Why Johnny Can’t Prompt” from @zamfirescu-pereira2023johnny
- The Open AI tutorial on 'Question answering using a search API' (I used it here) roughly follows the @gao2022precise technique for generating a hypothetical answer and looking for that in the embedding space.
What tools do enterprise / site search providers use to demonstrate their strengths to customers and how might they be adaptable to evaluating options and encouraging improvement in the consumer web search context?
HT:
ArcSearch (from The Browser Company)
a tweet from me...
@danielsgriffin via Twitter on Jan 28, 2024
Quick. Pretty.
(Interesting re the big recent \@perplexity_ai splash; now available as a default on \@browsercompany’s Arc.)
I want more feedback options. Looking forward to Share being enabled on these. Curious what content creators think. Non-‘Browse for Me’ search is Google?!




Perplexity AI
- Connected with some of my thoughts over at SearchRights.org and #llm_system_screenshots_and_social_share_cards:
- See: Kevin Roose's 2/1 piece on Perplexity AI in the New York Times.
- My notes pending...
Hugging Face & web search
Hugging Face is thinking of adding "RAG (and web search)" to their new Hugging Chat Assistants: huggingface.co/chat/assistants
a tweet from me...
@danielsgriffin via Twitter on Feb 5, 2024
“Add RAG (and web search) to Assistant”
Looking forward to following this.
\@huggingface could provide users choice over a range of web search sources, tools to evaluate both fit-for-purpose & effective performance, and open analytics for researchers, devs, & content creators.
Currently they support a web search option in their HuggingChat: huggingface.co/chat/
maybe-useful-hints and distractors
via a complaint from Neal Parikh
Query: [NYC government used to have a different structure than used today, possibly in the 1940s, but I can’t remember. Please explain.]
Search intent: New York City Board of Estimate
This includes raw links to AI generated content on LLM and generative web search platforms:
Mention of the 1940s seems to serve as a bit of a distractor for Perplexity and 7 other search tools I tried. ChatGPT 4 got it, as did Bard. Exa had it in the third result.
Perplexity AI w/ distractor / maybe-useful-hint.
Perplexity AI w/o distractor / maybe-useful-hint.
See the thread of replies for a WIDE range of responses...
Multiple attempts may suggest a different pattern.
new-to-me generative web search systems
I'm always looking to explore new approaches to the search experience.
Findera
- HT: Twitter
Peruser AI
- HT: Twitter
