Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo·2025-05-14·paper·source

Metadata unverified. Title, authors, and URL verified; date is best-estimate from the arxiv ID (2505 = May 2025) — confirm exact first-submission date before citing.

Classification

Role

measurement-piece

Domain

research

Source type

paper

Harness types

validation-harnessratification-harness

Validation position

immediately-after-generationpost-deployment

Validation mode

empiricalinterpretivesocial

Prescription stance

strongly-procedural

Relation to argument

validation-is-constitutiveobservability-mattersinstitutions-shape-capability

Useful against

"Reward richness is the lever" framings — this paper asks which construct the reward even measures.
"Thin harness, fat skills" — reminds that the skills you think you are pushing into the model are defined by the evaluations you check them with.

Useful for

Anyone who wants to score a library entry on institutional_ratification or observability with conceptual grounding rather than intuition.

Hanna Wallach, Meera Desai, A. Feder Cooper, Angelina Wang, Chad Atalla, Solon Barocas, Su Lin Blodgett, Alexandra Chouldechova, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Nicholas Pangakis, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, Abigail Z. Jacobs · 2025-02-01

#measurement#construct-validity#evaluationvalidation-is-constitutiveinstitutions-shape-capabilityobservability-mattersvalidation-harnessratification-harness

An open-source spec for Codex orchestration: Symphony

Alex Kotliarskyi, Victor Zhu, and Zach Brock · 2026-04-26

validation-is-constitutiveobservability-mattersinstitutions-shape-capabilityvalidation-harness

Deep Research Query: Work Registration and Collision Prevention

Daniel S. Griffin · 2026-05-05

validation-is-constitutiveobservability-mattersinstitutions-shape-capabilityvalidation-harness

Standard Signal: AI-native hedge fund announcement

Michael Royzen · 2026-02-28

institutions-shape-capabilityvalidation-is-constitutivevalidation-harnessratification-harness

Overlap is computed on tags, relation-to-argument, and harness types — not on role or domain, because contrasts are often the most useful neighbours.

Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Classification

Extended capability commentary

Why it matters

Annotation

Useful against

Useful for

Related entries