The Debate over Understanding in AI's Large Language Models

Summary¶

Mitchell and Krakauer survey the active debate over whether large language models understand the language they generate, contrasting two camps: skeptics who emphasize statistical pattern-matching and brittleness, and partisans who point to emergent abilities and human-like performance on cognitive tasks. They argue both camps operate with under-specified, often anthropocentric notions of "understanding," and call for empirical operationalization rather than further dueling intuitions.

Contribution¶

A clean problem statement for what to argue about when arguing about LLM cognition — and a survey of the empirical work each camp draws on. Reframes the dispute as a methodological one (we lack agreed operationalizations of understanding) rather than a purely philosophical one.

Method¶

Position paper / review. Synthesizes recent empirical and conceptual work; identifies open empirical questions.

Relevance to RISE¶

RISE systems treat LLMs as collaborators in knowledge production — a stance that implicitly imports claims about what those models can do with knowledge. Mitchell and Krakauer's framing is a useful prophylactic: it makes explicit that scoring an autonomy-3 RISE system on its outputs does not settle whether the underlying model understands the science, and that this distinction matters when the outputs are scholarly artifacts that other agents (or humans) will build on.

Critique / open questions¶

The article precedes the wider deployment of agentic systems; the debate has since shifted toward whether tool-augmented LLMs blur the understanding/use distinction further.
Operationalization of understanding is invited but not delivered; the paper is best read as setting the agenda.