Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence

Summary¶

A preregistered online experiment assigning 444 college-educated professionals to occupation-specific, incentivised mid-level writing tasks, with half randomly given access to ChatGPT. The treated group's time-to-completion fell by roughly 0.8 SDs and output quality rose by roughly 0.4 SDs. ChatGPT compressed the productivity distribution by helping lower-ability workers most, and the tool largely substituted for worker effort rather than complementing skill, shifting tasks from rough-drafting toward idea-generation and editing. Treated workers also reported higher job satisfaction and self-efficacy and both heightened concern and excitement about automation.

Contribution¶

Among the first randomised experimental measurements of the productivity effects of a frontier GenAI tool on real professional writing tasks, with clean treatment-effect estimates on speed, quality, inequality, and worker attitudes.

Method¶

Preregistered online RCT (AEARCTR-0010882) with N=444 college-educated professionals; occupation-specific incentivised writing tasks; random assignment to ChatGPT access; outcomes measured in standard deviations of time and graded output quality.

Relevance to RISE¶

Although it precedes the GenAI scientific-research wave, this is the foundational empirical evidence on the productivity-augmentation question that the RISE research-productivity thread builds on. It anchors the productivity-cluster alongside filimonovic2025genai, kwon2025inequality, and brynjolfsson2025genaiwork: the inequality-compressing pattern (low-ability workers gaining most) and the task-restructuring finding (away from drafting, toward ideation and editing) reappear in those later studies and are directly relevant to how human-ai-research-collaboration is designed in RISE catalog projects.

Critique / open questions¶

The tasks are short mid-level writing exercises rather than scientific research, and outcomes are graded by humans whose own LLM exposure may bias the quality measure. Whether the gains persist in longer, more knowledge-intensive workflows is left open.

Key quotes¶

"We examine the productivity effects of a generative artificial intelligence technology—the assistive chatbot ChatGPT—in the context of mid-level professional writing tasks. In a preregistered online experiment, we assign occupation-specific, incentivized writing tasks to 444 college-educated professionals, and randomly expose half of them to ChatGPT."

"Our results show that ChatGPT substantially raises average productivity: time taken decreases by 0.8 SDs and output quality rises by 0.4 SDs. Inequality between workers decreases, as ChatGPT compresses the productivity distribution by benefiting low-ability workers more."

"ChatGPT mostly substitutes for worker effort rather than complementing worker skills, and restructures tasks towards idea-generation and editing and away from rough-drafting."

Files¶

Main paper: papers/pdfs/noy2023experimental.pdf
Supplementary materials: papers/pdfs/noy2023experimental_supp.pdf (preregistration, full task list, additional robustness analyses; part of the same Science publication, no separate bib entry)