Skip to content

CORAL

external · status: active · focus: end-to-end · discipline: general · started: 2026

Project page: https://github.com/Human-Agent-Society/CORAL

Source: projects/landscape/coral.yml

Positioning

Infrastructure (arXiv:2604.01658) for multi-agent autonomous self-evolution — organizations of AI agents that run experiments, share knowledge through persistent stores, and continuously improve solutions against a user-supplied grading script. Sits in the autoresearch infrastructure layer alongside Aviary and MLGym, but emphasizes evolution and self-improvement rather than benchmarking.

Distinctive contribution

Treats the organization of agents (workspaces, shared knowledge, judges) as a first-class engineering surface, with rubric-based judge packages (race_japan_grader, apex_judge) that themselves spawn Claude Code for evaluation. Natively integrated with Claude Code, OpenCode, Codex, and Cursor.

Evaluation scores

Dimension Score (0–3) Note
Lifecycle coverage 2 Four stages spanning design through internal review; oriented to optimization rather than publication.
Autonomy level 3 Autonomous evolution loop; user supplies task + grader.
Architectural transparency 3 Open under MIT; arXiv:2604.01658; integrates Claude Code, OpenCode, Codex, Cursor with documented patterns.
Inputs supported 2 Codebase + grading script inputs; multiple coding-agent back-ends.
Outputs / reproducibility 2 Isolated workspaces + persistent knowledge stores; LLM nondeterminism limits exact reruns.
Internal evaluation 2 Rubric-judge packages provide structured internal evaluation; arXiv paper presents systematic results.
Openness 3 MIT-licensed; uv-installable; broad agent-back-end support.
Maturity / traction 2 655 stars; active 2026 development; integrated with major coding agents.
Cross-family policy 1 Multi-agent coding-agent integration (Claude Code, Codex, OpenCode, Cursor) — cross-family configurable.
Runtime assurance 2 Rubric judges (race_japan_grader, apex_judge) + isolated workspaces + persistent shared knowledge = moderate gating.
Cross-platform portability 2 Multiple coding-agent back-ends (Claude Code, OpenCode, Codex, Cursor, Kiro) — broad portability.

Scored on 2026-05-18. See the evaluation rubric.

Tags

Pipeline stages: research-design data-analysis code-generation referee-simulation

Architectural features: multi-agent persistent-memory artifact-versioning iterative-loop debate-consensus

Inputs: codebase grading-script

Outputs: evolved-solutions shared-knowledge-store judge-reports

Data sources: user-provided

Knowledge sources: shared-knowledge-store

Limitations

  • Optimization-focused: best fit for grade-against-script tasks, not open-ended scholarly authoring.
  • Heavy on coding-agent integration; lighter on literature-layer integration.

Papers describing this project

  • CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery — Qu, A., Zheng, H., Zhou, Z., Yan, Y., Tang, Y., Ong, S. Y., et al. (2026). arXiv. arXiv:2604.01658