Is anyone working on a google vs perplexity vs bard vs gpt search benchmark? Thinking something like a sample of 100 queries run every quarter with real humans providing an evaluation across a set of consistent metrics?
Related: