Sakana AI Scientist (v1)¶
external · status: active · focus: end-to-end · discipline: computer-science · started: 2024
Project page: https://github.com/SakanaAI/AI-Scientist
Source: projects/landscape/sakana-ai-scientist-v1.yml
Positioning¶
The original AI Scientist release (arXiv:2408.06292): an end-to-end agentic pipeline that ideates, runs experiments, and writes a paper with self-review on a fixed set of CS templates (NanoGPT, 2D Diffusion, Grokking). Foundational reference for the AI-scientist line and direct predecessor to v2.
Distinctive contribution¶
First widely-noticed system to demonstrate a complete machine-learning paper produced autonomously by an LLM pipeline, including in-pipeline reviewer simulation. Released with example papers and per-template scaffolds, making the line of work concrete and inspectable.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 2 | Seven stages from ideation through self-review; no literature-synthesis or external dissemination stage. |
| Autonomy level | 3 | Runs end-to-end without per-step human approval; user supplies a template and topic. |
| Architectural transparency | 3 | Code, prompts, templates, and example outputs publicly released; arXiv paper documents the design. |
| Inputs supported | 1 | Narrow input form (area + curated template); limited external data integration. |
| Outputs / reproducibility | 2 | Papers + code + logs persisted; reproducibility tied to the seeded template. |
| Internal evaluation | 2 | Self-review pass and qualitative evaluation in the arXiv paper; external evaluations of output quality are mixed. |
| Openness | 2 | Open source under a non-OSI 'Other' license (verify terms before reuse); requires Linux + NVIDIA GPU. |
| Maturity / traction | 3 | 13k+ stars, peer attention, basis for follow-up systems and the v2 release. |
| Cross-family policy | 0 | Same as v2 — self-refinement within one model family. |
| Runtime assurance | 1 | Self-review pass + per-template scaffold checks. |
| Cross-platform portability | 0 | Linux + NVIDIA GPU + texlive locked. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: hypothesis-generation research-design data-analysis code-generation paper-drafting revision-editing referee-simulation
Architectural features: multi-agent tool-use iterative-loop artifact-versioning
Inputs: research-area template
Outputs: paper-pdf code experiment-logs llm-generated-review
Data sources: template-datasets
Knowledge sources: semantic-scholar
Limitations¶
- Locked to three CS templates; portability to other domains community-maintained.
- Self-review correlates weakly with external peer-review judgments.
- Non-permissive license; requires Linux + NVIDIA GPU + texlive-full.
- Executes LLM-written code; security/containment is the user's responsibility.
Related projects in this catalog¶
Papers describing this project¶
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery — Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., Ha, D. (2024). arXiv. arXiv:2408.06292
Also compared in¶
- ARIS Table 4 (
yang2026aris) — Cited as exhibiting 'recurring limitations' of same-model self-refinement; scored: no cross-family, partial adversarial review, no composable skills, ✓ E2E, partial assurance, no portability. - A Survey of AI Scientists (
tie2025aiscientistsurvey) — Covered as a foundational AI-scientist system.
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning - Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools
schick2023toolformer