CORAL¶
external · status: active · focus: end-to-end · discipline: general · started: 2026
Project page: https://github.com/Human-Agent-Society/CORAL
Source: projects/landscape/coral.yml
Positioning¶
Infrastructure (arXiv:2604.01658) for multi-agent autonomous self-evolution — organizations of AI agents that run experiments, share knowledge through persistent stores, and continuously improve solutions against a user-supplied grading script. Sits in the autoresearch infrastructure layer alongside Aviary and MLGym, but emphasizes evolution and self-improvement rather than benchmarking.
Distinctive contribution¶
Treats the organization of agents (workspaces, shared knowledge, judges) as a first-class engineering surface, with rubric-based judge packages (race_japan_grader, apex_judge) that themselves spawn Claude Code for evaluation. Natively integrated with Claude Code, OpenCode, Codex, and Cursor.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 2 | Four stages spanning design through internal review; oriented to optimization rather than publication. |
| Autonomy level | 3 | Autonomous evolution loop; user supplies task + grader. |
| Architectural transparency | 3 | Open under MIT; arXiv:2604.01658; integrates Claude Code, OpenCode, Codex, Cursor with documented patterns. |
| Inputs supported | 2 | Codebase + grading script inputs; multiple coding-agent back-ends. |
| Outputs / reproducibility | 2 | Isolated workspaces + persistent knowledge stores; LLM nondeterminism limits exact reruns. |
| Internal evaluation | 2 | Rubric-judge packages provide structured internal evaluation; arXiv paper presents systematic results. |
| Openness | 3 | MIT-licensed; uv-installable; broad agent-back-end support. |
| Maturity / traction | 2 | 655 stars; active 2026 development; integrated with major coding agents. |
| Cross-family policy | 1 | Multi-agent coding-agent integration (Claude Code, Codex, OpenCode, Cursor) — cross-family configurable. |
| Runtime assurance | 2 | Rubric judges (race_japan_grader, apex_judge) + isolated workspaces + persistent shared knowledge = moderate gating. |
| Cross-platform portability | 2 | Multiple coding-agent back-ends (Claude Code, OpenCode, Codex, Cursor, Kiro) — broad portability. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: research-design data-analysis code-generation referee-simulation
Architectural features: multi-agent persistent-memory artifact-versioning iterative-loop debate-consensus
Inputs: codebase grading-script
Outputs: evolved-solutions shared-knowledge-store judge-reports
Data sources: user-provided
Knowledge sources: shared-knowledge-store
Limitations¶
- Optimization-focused: best fit for grade-against-script tasks, not open-ended scholarly authoring.
- Heavy on coding-agent integration; lighter on literature-layer integration.
Related projects in this catalog¶
Papers describing this project¶
- CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery — Qu, A., Zheng, H., Zhou, Z., Yan, Y., Tang, Y., Ong, S. Y., et al. (2026). arXiv. arXiv:2604.01658
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning