Skip to content

Tongyi DeepResearch

external · status: active · focus: literature · discipline: general · started: 2025

Project page: https://github.com/Alibaba-NLP/DeepResearch

Source: projects/landscape/tongyi-deepresearch.yml

Positioning

An agentic large language model purpose-built for long-horizon deep-information-seeking tasks (arXiv:2510.24701), shipped both as open weights (30.5B total / 3.3B active) and as inference code with ReAct and 'Heavy' (IterResearch) modes. Sits in the literature/ synthesis block of the RISE diagram, with state-of-the-art reported results on agentic-search benchmarks (BrowseComp, FRAMES, Humanity's Last Exam).

Distinctive contribution

Treats agentic-research capability as a model-training problem, not just an orchestration problem: continual agentic pre-training, fully automated synthetic data generation, and end-to-end on-policy RL with GRPO. Distinguishes itself from prompt-engineering pipelines by shipping a model purpose-trained for research-style tool use.

Evaluation scores

Dimension Score (0–3) Note
Lifecycle coverage 1 Three deep-research stages; not an end-to-end paper-writing pipeline.
Autonomy level 3 Runs end-to-end without per-step approval; long-horizon trajectories are the design target.
Architectural transparency 3 Open under Apache-2.0; arXiv:2510.24701 + technical blog document training and inference.
Inputs supported 2 Research-query inputs; ReAct and Heavy inference paradigms; HuggingFace + ModelScope deployment.
Outputs / reproducibility 2 Open weights and inference scripts; full training pipeline partially released.
Internal evaluation 3 Extensive benchmark results reported on BrowseComp, FRAMES, HLE, SimpleQA; state-of-the-art on several.
Openness 3 Apache-2.0; HuggingFace + ModelScope checkpoints; commercial deployment via Bailian.
Maturity / traction 3 18k+ stars; production deployment via Aliyun Bailian; widely-discussed model release.
Cross-family policy 0 Self-trained model; no architectural cross-family policy.
Runtime assurance 1 IterResearch 'Heavy' mode adds test-time deliberation; no published claim-audit harness.
Cross-platform portability 2 HuggingFace + ModelScope + Bailian deployment; ReAct + IterResearch inference modes.

Scored on 2026-05-18. See the evaluation rubric.

Tags

Pipeline stages: rq-formulation literature-discovery literature-synthesis

Architectural features: tool-use rag-knowledge-base iterative-loop

Inputs: research-question

Outputs: cited-response research-report

Data sources: web-search

Knowledge sources: web-search

Limitations

  • Specialized for deep-information-seeking; does not produce papers, code, or replication packages directly.
  • Heavy mode is compute-expensive; quality–cost trade-off non-trivial.
  • Real-world agentic search depends on live web state — outputs not bitwise reproducible.

Papers describing this project

  • Tongyi DeepResearch Technical Report — Tongyi DeepResearch Team (2025). arXiv (Alibaba). arXiv:2510.24701

Also compared in

  • Agentic AI for Scientific Discovery: A Survey (gridach2025agenticsurvey) — Covered as a deep-research agent trained via RL.