The Extended Frontier: The Economics

·Daniel Griffin·Hypandra·7 min read

Working Draft

This post is a working draft, developed collaboratively with Claude Opus 4.6 (1M context) in Claude Code (Anthropic) across multiple sessions, using a variety of extensions: persona-based reviews, citation verification, AI detection analysis (Pangram Labs), deep research reports, and dissertation search (qmd). The argument, evidence curation, and editorial direction are Daniel’s; much of the prose was initially generated by Claude and is being iteratively rewritten. A changelog tracks the development. We will continue to write about how we’re exploring this idea—the process is part of the argument.

The task-exposure literature counts what AI can do and infers displacement. Garicano's bundle theory asks whether the task can be separated from the job. The extensions framework adds a missing dimension: economic models treat task completion as binary and quality-invariant, but when AI 'completes' a task without the extensions that made it reliable, productivity goes up on paper while quality degrades underneath.


Changelog
  • 2026-03-24 — First draft published.
*This is part of [The Extended Frontier](/2026/03/24/extended-frontier) series.*

The earlier posts in this series are about what the frontier is and what shapes it. But the policy conversation about AI and work is driven by labor economics, and the labor economics literature has its own frameworks for thinking about tasks, automation, and displacement. This post asks how the extensions framework connects to that literature, and what it adds. The short answer: the economic models have no variable for whether the practice's feedback loops are engaged, and that missing variable is doing a lot of work.

Task-exposure models

The dominant framework for thinking about AI and jobs right now is task-exposure. Eloundou et al. (2024) measure AI impact by counting how many tasks in an occupation AI can perform. The logic runs in one direction: more exposure means more displacement. If a model can do 80% of a radiologist's tasks, radiologists are 80% exposed.

The problem is that this treats task completion as binary and quality-invariant. A task is either automatable or it isn't, and if it is, the automation is assumed to produce equivalent output. There's no room in the model for the possibility that the same task, performed by the same model, produces different quality depending on the context in which it's performed.

The bundle framework

Garicano, Li, and Wu ("Weak Bundle, Strong Bundle," 2026) argue that task-exposure overpredicts displacement because it ignores something structural: bundle strength. The question isn't just whether AI can do a task. It's whether that task can be contractually separated from the rest of the job.

Take radiology. Radiologists spend about a third of their time interpreting scans. The rest is patient communication, training residents, clinical coordination with other specialists. The scan-reading task is theoretically automatable. But you can't peel it off and hand it to a machine without losing the rest of the bundle. The tasks are bound together by institutional, contractual, and practical ties. The bundle holds.

This is a real improvement over raw task-exposure. It explains why some highly exposed occupations haven't seen the displacement that the models predict. But it's still answering a contractual question: can this task be separated? It's not asking a practice question: what happens to the work when you try?

What both miss

Neither framework asks whether the practice's feedback loops engage the AI. The economic models assume that if AI "can do" a task, the task is done to the same standard. The bundle framework asks whether the task can be separated from the job. Neither asks what happens to quality when you separate the task from the practices that made it reliable. The extensions framework adds the missing variable: "can do" is context-dependent, and the economic models have no way to represent that.

The ATM

The ATM is the canonical example of task automation, but it's worth looking at through an extensions lens rather than just a bundle lens.

Cash dispensing has the strongest possible extension structure: binary feedback, instant verification, no ambiguity. You asked for $200, you got $200 or you didn't. The feedback loop is tight, fast, and complete. There's no interpretive gap between what the machine did and whether it worked. That's why it was automatable in 1969 and why it stayed automated. The extension structure made the automation reliable from day one.

What the ATM left behind were the tasks with the weakest extensions: relationship banking, fraud detection requiring judgment about a specific customer, complex financial advising. These aren't just hard to automate in some abstract sense. Their feedback is social, slow, and distributed. You find out whether you gave good financial advice months or years later, mediated through the client's life circumstances. The extensions that would make automation reliable don't exist in a form that machines can engage.

The ATM didn't just automate a task from the bundle. It automated the task whose extension structure made automation reliable and left the tasks whose extensions resist mechanization. The bundle framework explains why tellers weren't fully displaced (their other tasks couldn't be separated). The extensions framework explains which task got automated and why it worked.

The MASAI trial as economics

The MASAI mammography screening trial (Lång et al., 2023) appears in the first post as evidence that extensions smooth the frontier. Here I want to read it as an economics case.

The task-exposure framing would say: AI can read mammograms, radiologists are exposed, expect displacement. The bundle framing would say: radiologists do more than read mammograms, the bundle may hold. Both framings treat it as a question about whether the radiologist keeps their job.

But what actually happened in the trial is more interesting than either prediction. AI-supported screening achieved higher sensitivity than either AI alone or radiologists alone, with 44.3% less reading workload. That's not displacement and it's not preservation. It's augmentation through extensions. The triage protocols, the double-reading conventions, the arbitration processes, the follow-up infrastructure—these already existed in the screening practice. They were constitutive of how mammography screening works. The AI was wired into those extensions, and the combination outperformed either component.

The economic models can describe the productivity gain (fewer reading hours per screen detected). What they can't describe is why this worked when other AI deployments in medicine haven't. The extensions framework can: the practice had strong, fast, formalized feedback loops, and the deployment engaged them rather than bypassing them.

Illusory smoothness in economic measurement

Here's what I think is the novel contribution of looking at AI economics through an extensions lens: the illusory smoothness problem.

When AI "completes" a task without the extensions that made the task reliable, the quantity metric says productivity went up. More tasks per hour. More documents processed. More code generated. The throughput number looks great. But the quality metric—if anyone is measuring it—says the frontier just got jagged. The work is being done faster and worse, and the economic measurement framework registers only the faster part.

Anil Dash's piece on AI coding ("AI Isn't a Replacement for Developers, It's a Replacement for Development," 2025) frames the shift as labor economics: companies want cheaper code, AI lets them deskill the workforce, craft loses to cost pressure. He's right that the incentive structure pushes toward treating code as commodity output. But he stops at the labor frame and doesn't ask the mechanism question: why does AI-generated code sometimes work and sometimes fail? As the earlier posts in this series argue, the answer is whether the practice's extensions are engaged. When they are, AI coding is genuinely productive. When they're bypassed—generate code, ship it, skip the tests—the output degrades in ways the throughput metric can't see. Dash describes the economic pressure to deskill but doesn't engage with the fact that code's extension structure is what makes the quality question answerable in the first place. You can measure that more code is being written. You can't measure, in a task-exposure model, whether the extensions that make code good code were engaged. The productivity number is smooth. The frontier underneath it may be jagged.

This isn't hypothetical. The 700+ documented cases of AI hallucinations in legal filings are exactly this pattern. The productivity metric says the work got done. The extensions that would have caught the problem were bypassed, and the metric couldn't see it.

Early empirical evidence

The early labor economics evidence is compatible with the extensions framework, but it doesn't test it. These studies weren't designed with extension structure in mind, and the findings don't distinguish between competing explanations. That's worth being explicit about.

Humlum and Vestergaard (2025) find task restructuring but no significant earnings or hours effects from chatbot adoption in Denmark. Gathmann et al. (2024) find that AI shifts task content with small displacement effects in Germany. The bundle framework reads this as: the bundle held. Tasks got reshuffled within occupations but couldn't be separated from the jobs that contain them. That's a plausible reading. The data supports it.

An extensions reading is also available: the practice reorganized around its feedback loops rather than being displaced by them. Tasks with stronger extensions got augmented; tasks with weaker extensions got shifted elsewhere. But honestly, the data can't tell us which explanation is doing the work. "The bundle held" and "the practice reorganized around feedback loops" both predict the same observable outcome—task reshuffling without displacement. The extensions framework would need evidence about which tasks got augmented and whether extension structure (feedback speed, formalization, verification tightness) predicted the pattern. That evidence doesn't exist yet in these datasets.

So the early empirical picture is this: the findings are not inconsistent with an extensions account, but they don't specifically support it over the bundle account. Both frameworks predict task restructuring over displacement, just for different reasons. The economics question is incomplete without the practice question, and the practice question benefits from the economic framing that asks about displacement, restructuring, and measurement. But the empirical case for extensions as a distinct explanatory variable in labor economics is still ahead of us, not behind us.

The missing variable in the economic models is the one I keep returning to: the quality dimension that task-exposure can't encode. Until the models have a way to represent whether the practice's feedback loops were engaged—not just whether the task was completed—the productivity numbers will keep looking smoother than the frontier actually is.