Skip to content

STORM / Co-STORM

external · status: active · focus: literature · discipline: general · started: 2024

Project page: https://github.com/stanford-oval/storm

Source: projects/landscape/storm.yml

Positioning

An LLM-powered knowledge-curation system that writes Wikipedia-style long-form articles from web search. STORM uses perspective-guided question asking and simulated conversations between a writer and a topic expert; Co-STORM (EMNLP 2024) adds a collaborative discourse protocol with human-in-the-loop and a dynamic mind map.

Distinctive contribution

Treats the pre-writing problem (deciding what questions to ask) as the central bottleneck of automated long-form writing, and operationalizes it via perspective discovery and simulated expert dialogue. Co-STORM further makes the human–LLM curation loop a first-class architectural element.

Evaluation scores

Dimension Score (0–3) Note
Lifecycle coverage 1 Three stages clustered around the pre-writing + drafting block; no analysis, modeling, or review.
Autonomy level 2 Supervised in STORM (topic → article); Co-STORM adds collaborative human steering.
Architectural transparency 3 Open under MIT; modular interfaces; two arXiv papers (NAACL 2024 + EMNLP 2024) document the design.
Inputs supported 2 Single input form (topic) but multiple retrieval back-ends: Bing, You.com, custom vector store.
Outputs / reproducibility 2 Pip-installable knowledge-storm package; outputs are deterministic given the retrieval back-end and model.
Internal evaluation 2 Both papers report systematic evaluations against baselines and Wikipedia editors.
Openness 3 MIT-licensed, pip-installable, demo site, public papers.
Maturity / traction 3 28k+ stars, live research preview with 70k+ users, integrated into multiple downstream projects.
Cross-family policy 0 Single LLM provider per run.
Runtime assurance 1 Perspective-guided question asking + simulated conversation provide light internal review.
Cross-platform portability 2 Multiple retrieval back-ends (Bing, You.com, VectorRM); knowledge-storm pip package usable across providers.

Scored on 2026-05-18. See the evaluation rubric.

Tags

Pipeline stages: literature-discovery literature-synthesis paper-drafting

Architectural features: multi-agent human-in-loop tool-use rag-knowledge-base iterative-loop

Inputs: topic

Outputs: long-form-article citations mind-map

Data sources: web-search user-provided-documents

Knowledge sources: bing-search you-search vector-rm

Limitations

  • Stated explicitly by authors: output is not publication-ready and requires significant editing.
  • Focused on Wikipedia-style synthesis; not designed to generate novel research.
  • Quality dependent on search back-end coverage.

Papers describing this project

  • Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models — Shao, Y., Jiang, Y., Kanell, T. A., Xu, P., Khattab, O., Lam, M. S. (2024). NAACL 2024. arXiv:2402.14207
  • Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations — Jiang, Y., Shao, Y., Ma, D., Semnani, S. J., Lam, M. S. (2024). EMNLP 2024 (Co-STORM). arXiv:2408.15232

Also compared in

  • Agentic AI for Scientific Discovery: A Survey (gridach2025agenticsurvey) — Covered as a flagship literature-synthesis agent.