Tongyi DeepResearch¶
external · status: active · focus: literature · discipline: general · started: 2025
Project page: https://github.com/Alibaba-NLP/DeepResearch
Source: projects/landscape/tongyi-deepresearch.yml
Positioning¶
An agentic large language model purpose-built for long-horizon deep-information-seeking tasks (arXiv:2510.24701), shipped both as open weights (30.5B total / 3.3B active) and as inference code with ReAct and 'Heavy' (IterResearch) modes. Sits in the literature/ synthesis block of the RISE diagram, with state-of-the-art reported results on agentic-search benchmarks (BrowseComp, FRAMES, Humanity's Last Exam).
Distinctive contribution¶
Treats agentic-research capability as a model-training problem, not just an orchestration problem: continual agentic pre-training, fully automated synthetic data generation, and end-to-end on-policy RL with GRPO. Distinguishes itself from prompt-engineering pipelines by shipping a model purpose-trained for research-style tool use.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 1 | Three deep-research stages; not an end-to-end paper-writing pipeline. |
| Autonomy level | 3 | Runs end-to-end without per-step approval; long-horizon trajectories are the design target. |
| Architectural transparency | 3 | Open under Apache-2.0; arXiv:2510.24701 + technical blog document training and inference. |
| Inputs supported | 2 | Research-query inputs; ReAct and Heavy inference paradigms; HuggingFace + ModelScope deployment. |
| Outputs / reproducibility | 2 | Open weights and inference scripts; full training pipeline partially released. |
| Internal evaluation | 3 | Extensive benchmark results reported on BrowseComp, FRAMES, HLE, SimpleQA; state-of-the-art on several. |
| Openness | 3 | Apache-2.0; HuggingFace + ModelScope checkpoints; commercial deployment via Bailian. |
| Maturity / traction | 3 | 18k+ stars; production deployment via Aliyun Bailian; widely-discussed model release. |
| Cross-family policy | 0 | Self-trained model; no architectural cross-family policy. |
| Runtime assurance | 1 | IterResearch 'Heavy' mode adds test-time deliberation; no published claim-audit harness. |
| Cross-platform portability | 2 | HuggingFace + ModelScope + Bailian deployment; ReAct + IterResearch inference modes. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: rq-formulation literature-discovery literature-synthesis
Architectural features: tool-use rag-knowledge-base iterative-loop
Inputs: research-question
Outputs: cited-response research-report
Data sources: web-search
Knowledge sources: web-search
Limitations¶
- Specialized for deep-information-seeking; does not produce papers, code, or replication packages directly.
- Heavy mode is compute-expensive; quality–cost trade-off non-trivial.
- Real-world agentic search depends on live web state — outputs not bitwise reproducible.
Related projects in this catalog¶
Papers describing this project¶
- Tongyi DeepResearch Technical Report — Tongyi DeepResearch Team (2025). arXiv (Alibaba). arXiv:2510.24701
Also compared in¶
- Agentic AI for Scientific Discovery: A Survey (
gridach2025agenticsurvey) — Covered as a deep-research agent trained via RL.
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning