Kosmos (jimmc414 implementation)¶

external · status: research-prototype · focus: end-to-end · discipline: general · started: 2025

Project page: https://github.com/jimmc414/Kosmos

Source: projects/landscape/kosmos.yml

Positioning¶

An open-source implementation of the Kosmos AI scientist architecture (Lu et al., arXiv:2511.02824), adapted to run via Claude Code or the Anthropic / OpenAI APIs. Runs autonomous research cycles: hypothesis generation, experiment design, code execution in sandboxed Docker, validation against an 8-dimension quality framework, and knowledge-graph construction.

Distinctive contribution¶

Operationalizes the Kosmos architecture as a runnable system on commodity infrastructure, including a built-in 8-dimension validation harness and an explicit knowledge-graph layer that tracks concept relationships across cycles. The validation framework is independently reusable as a methodological scaffold.

Evaluation scores¶

Dimension	Score (0–3)	Note
Lifecycle coverage	2	Four stages from ideation through analysis; no paper drafting or review.
Autonomy level	3	Autonomous research cycles by design.
Architectural transparency	3	Open implementation; references the source paper (arXiv:2511.02824); 3700+ tests reported in repo.
Inputs supported	2	Research-area + data inputs; Anthropic or OpenAI back-ends; Docker-sandboxed execution.
Outputs / reproducibility	2	Knowledge graph + validated-discovery artifacts persisted; cycle outputs deterministic given fixed inputs and model.
Internal evaluation	2	Built-in 8-dimension quality framework; broader external evaluation pending.
Openness	1	Source public but no declared license in repo metadata — reuse rights uncertain.
Maturity / traction	2	511 stars; alpha-stage release; active community uptake post-Kosmos paper.
Cross-family policy	1	Anthropic or OpenAI API back-ends — cross-family possible by config.
Runtime assurance	2	Built-in 8-dimension quality framework + knowledge-graph consistency checks + sandboxed Docker execution.
Cross-platform portability	1	Anthropic or OpenAI; single agent framework.

Scored on 2026-05-18. See the evaluation rubric.

Tags¶

Pipeline stages: hypothesis-generation research-design data-analysis code-generation

Architectural features: multi-agent tool-use rag-knowledge-base persistent-memory iterative-loop artifact-versioning

Inputs: research-area data

Outputs: hypotheses experiment-results knowledge-graph validated-discoveries

Data sources: user-provided

Knowledge sources: literature knowledge-graph

Limitations¶

No declared open-source license.
Alpha-stage; expect breaking changes.
Code-execution without Docker falls back to exec() with static validation — security caveat in README.
Implements an architecture from a third-party paper; not a primary research artifact.

Papers describing this project¶

Kosmos: An AI Scientist for Autonomous Discovery — Mitchener, L., Yiu, A., Chang, B., Bourdenx, M., Nadolski, T., Sulovari, A., et al. (2025). arXiv. arXiv:2511.02824

Also compared in¶

A Survey of AI Scientists (tie2025aiscientistsurvey) — Covered as an autonomous-discovery system.

Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools wu2025agenticreasoning
Park, J. S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior park2023generative