Posts with the “evaluating-results-meta” tag:
“Product human evals are what matter to us”
The Need for ChainForge-like Tools in Evaluating Generative Web Search Platforms
How do people perceive and perform-with tool outputs?
[What is this type of encoding? ']
[I want to buy a new SUV which brand is best?]