Skip to content

DeepResearcher (GAIR-NLP)

external · status: active · focus: literature · discipline: general · started: 2025

Project page: https://github.com/GAIR-NLP/DeepResearcher

Source: projects/landscape/deepresearcher.yml

Positioning

An end-to-end RL-trained deep-research agent (arXiv:2504.03160) that learns to plan, retrieve, cross-validate, and self-reflect via reinforcement learning in real-world web environments rather than in simulated retrieval. Ships a 7B HuggingFace checkpoint (DeepResearcher-7b) trained via this pipeline.

Distinctive contribution

Argues that end-to-end RL on real web environments — not prompt engineering and not RL on retrieval simulators — is what unlocks emergent cognitive behaviors in research agents (planning, multi- source cross-validation, self-reflection, honest non-answer when evidence is missing). Reports +28.9 points over prompt baselines and +7.2 over RAG-RL baselines.

Evaluation scores

Dimension Score (0–3) Note
Lifecycle coverage 1 Three deep-research stages.
Autonomy level 3 Designed for autonomous multi-turn research without per-step approval.
Architectural transparency 3 Open under Apache-2.0; arXiv:2504.03160 documents training; checkpoint released; code public.
Inputs supported 2 Research-question inputs; trained for real-world web environments rather than simulated retrieval.
Outputs / reproducibility 2 Released checkpoint enables reproducible inference; full RL training pipeline released.
Internal evaluation 3 Quantitative gains vs. prompt-engineering and RAG-RL baselines reported in the arXiv paper.
Openness 3 Apache-2.0; open weights on HuggingFace; permissive license.
Maturity / traction 2 751 stars; active; recent academic release (2025-04).
Cross-family policy 0 Self-trained 7B model; single-family by design.
Runtime assurance 2 RL-induced cognitive behaviors (cross-validation, self-reflection, honest non-answer) are emergent runtime checks, not external gates.
Cross-platform portability 1 HuggingFace checkpoint + inference scripts; single model family.

Scored on 2026-05-18. See the evaluation rubric.

Tags

Pipeline stages: rq-formulation literature-discovery literature-synthesis

Architectural features: tool-use iterative-loop rag-knowledge-base

Inputs: research-question

Outputs: research-report citations

Data sources: web-search

Knowledge sources: web-search

Limitations

  • 7B parameter scale limits ceiling on the hardest benchmarks.
  • RL training is compute-intensive and not trivially reproducible end-to-end.
  • Live-web inference is non-deterministic by design.

Papers describing this project

  • DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments — Zheng, Y., Fu, D., Hu, X., Cai, X., Ye, L., Lu, P., et al. (2025). arXiv. arXiv:2504.03160

Also compared in

  • Agentic AI for Scientific Discovery: A Survey (gridach2025agenticsurvey) — Covered as an RL-trained deep-research agent.