Sakana AI Scientist v2¶

external · status: active · focus: end-to-end · discipline: computer-science · started: 2024

Project page: https://github.com/SakanaAI/AI-Scientist-v2

Source: projects/landscape/sakana-ai-scientist.yml

Positioning¶

An autonomous "AI scientist" pipeline that ideates, runs experiments (primarily ML), drafts a paper, and self-reviews. Targets the full RISE arc end-to-end with minimal human oversight per task.

Distinctive contribution¶

Among the earliest publicly released systems to demonstrate a single agentic pipeline producing complete machine-learning papers (idea → experiments → write-up → self-review) without per-step human intervention.

Evaluation scores¶

Dimension	Score (0–3)	Note
Lifecycle coverage	2	Covers ~7 stages; weak on literature discovery/synthesis and external dissemination.
Autonomy level	3	Runs end-to-end without per-task human approval.
Architectural transparency	3	Code, prompts, and configurations publicly released.
Inputs supported	1	Narrow input form (research area + template); limited external data integration.
Outputs / reproducibility	2	Persists papers + code + experiment logs; reproducibility depends on the seeded template.
Internal evaluation	2	Self-review pass and reported metrics on generated papers; external validation contested.
Openness	3	Open source under permissive license.
Maturity / traction	2	Released, widely discussed, multiple iterations (v1 → v2).
Cross-family policy	0	Same-model self-refinement is the canonical failure mode ARIS Table 4 identifies in this lineage.
Runtime assurance	1	Agentic tree search + self-review pass provide light runtime gating.
Cross-platform portability	0	Linux + NVIDIA GPU locked; single-template runtime.

Scored on 2026-05-18. See the evaluation rubric.

Tags¶

Pipeline stages: hypothesis-generation research-design data-analysis code-generation paper-drafting revision-editing referee-simulation

Architectural features: multi-agent tool-use iterative-loop artifact-versioning

Inputs: research-area seed-template

Outputs: paper-draft code experiment-logs self-review

Data sources: benchmark-datasets

Knowledge sources: semantic-scholar openreview

Limitations¶

ML-paper focus; portability to non-CS domains unclear.
Quality of produced papers strongly dependent on seed templates.
Self-review is not a substitute for external peer review.

Papers describing this project¶

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search — Yamada, Y., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., et al. (2025). arXiv. arXiv:2504.08066

Also compared in¶

ARIS Table 4 (yang2026aris) — Same scoring profile as v1 in Table 4; noted as workshop-level paper produced via agentic tree search.
A Survey of AI Scientists (tie2025aiscientistsurvey) — Covered as the v2 release of the foundational AI-scientist line.

Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools wu2025agenticreasoning
Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools schick2023toolformer