Skip to content

Sakana AI Scientist v2

external · status: active · focus: end-to-end · discipline: computer-science · started: 2024

Project page: https://github.com/SakanaAI/AI-Scientist-v2

Source: projects/landscape/sakana-ai-scientist.yml

Positioning

An autonomous "AI scientist" pipeline that ideates, runs experiments (primarily ML), drafts a paper, and self-reviews. Targets the full RISE arc end-to-end with minimal human oversight per task.

Distinctive contribution

Among the earliest publicly released systems to demonstrate a single agentic pipeline producing complete machine-learning papers (idea → experiments → write-up → self-review) without per-step human intervention.

Evaluation scores

Dimension Score (0–3) Note
Lifecycle coverage 2 Covers ~7 stages; weak on literature discovery/synthesis and external dissemination.
Autonomy level 3 Runs end-to-end without per-task human approval.
Architectural transparency 3 Code, prompts, and configurations publicly released.
Inputs supported 1 Narrow input form (research area + template); limited external data integration.
Outputs / reproducibility 2 Persists papers + code + experiment logs; reproducibility depends on the seeded template.
Internal evaluation 2 Self-review pass and reported metrics on generated papers; external validation contested.
Openness 3 Open source under permissive license.
Maturity / traction 2 Released, widely discussed, multiple iterations (v1 → v2).
Cross-family policy 0 Same-model self-refinement is the canonical failure mode ARIS Table 4 identifies in this lineage.
Runtime assurance 1 Agentic tree search + self-review pass provide light runtime gating.
Cross-platform portability 0 Linux + NVIDIA GPU locked; single-template runtime.

Scored on 2026-05-18. See the evaluation rubric.

Tags

Pipeline stages: hypothesis-generation research-design data-analysis code-generation paper-drafting revision-editing referee-simulation

Architectural features: multi-agent tool-use iterative-loop artifact-versioning

Inputs: research-area seed-template

Outputs: paper-draft code experiment-logs self-review

Data sources: benchmark-datasets

Knowledge sources: semantic-scholar openreview

Limitations

  • ML-paper focus; portability to non-CS domains unclear.
  • Quality of produced papers strongly dependent on seed templates.
  • Self-review is not a substitute for external peer review.

Papers describing this project

  • The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search — Yamada, Y., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., et al. (2025). arXiv. arXiv:2504.08066

Also compared in

  • ARIS Table 4 (yang2026aris) — Same scoring profile as v1 in Table 4; noted as workshop-level paper produced via agentic tree search.
  • A Survey of AI Scientists (tie2025aiscientistsurvey) — Covered as the v2 release of the foundational AI-scientist line.