AutoResearchClaw¶
external · status: active · focus: end-to-end · discipline: general · started: 2026
Project page: https://github.com/aiming-lab/AutoResearchClaw
Source: projects/landscape/autoresearchclaw.yml
Positioning¶
An autonomous research pipeline taking a chat-level idea to a full paper via ACP-compatible agent back-ends (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI). Ships v0.4.0+ with a Human-in-the-Loop Co-Pilot system offering 6 intervention modes (full-auto / gate-only / checkpoint / step-by-step / co-pilot / custom), 19 pre-loaded skills, anti-fabrication VerifiedRegistry, and CLI commands for attaching to / approving / redirecting in-flight pipelines.
Distinctive contribution¶
Among the largest user communities in the catalog (12.3k stars, multi-language localizations covering 9 languages + Discord community). The HITL co-pilot system with 6 explicit intervention modes is the most differentiated human-oversight surface among the end-to-end pipelines — and the anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop) is a named runtime-assurance component, not an evaluation afterthought.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 3 | Nine stages including referee simulation; explicit dissemination support via OpenClaw integration. |
| Autonomy level | 2 | User-selectable: full-auto through co-pilot. After v0.4.0 HITL launch, the design centers on partial autonomy rather than fully autonomous default. |
| Architectural transparency | 2 | Open MIT; v0.4.0 HITL guide documents the 6 intervention modes; some agent internals require source dive. |
| Inputs supported | 3 | Multiple inputs (idea, prior paper, existing results); 5+ ACP-compatible back-ends; messaging-platform integration via OpenClaw bridge. |
| Outputs / reproducibility | 2 | Per-project workspaces; experiment diagnosis & repair loop; 2699 tests passing — but not bitwise-deterministic by design. |
| Internal evaluation | 2 | Anti-fabrication VerifiedRegistry, paper showcase across 8 domains, active testing program; no published peer-reviewed evaluation paper at this scoring date. |
| Openness | 3 | MIT-licensed; multi-language documentation; active Discord community; community-contributed skill ecosystem. |
| Maturity / traction | 3 | 12.3k+ stars (largest in the catalog); regular releases (v0.4.x in 2026-04); broad community adoption signals. |
| Cross-family policy | 1 | Multiple back-ends (Claude/Codex/Copilot/Gemini/Kimi CLI) make cross-family configurations easy, but no required cross-family review policy in the architecture. |
| Runtime assurance | 3 | Anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop), SmartPause (confidence-driven dynamic intervention), ALHF intervention learning, cost budget guardrails. |
| Cross-platform portability | 3 | 5+ CLI back-ends (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) + 4 messaging platforms (Discord, Telegram, Lark, WeChat) via OpenClaw bridge. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: hypothesis-generation literature-discovery literature-synthesis research-design data-analysis code-generation paper-drafting revision-editing referee-simulation
Architectural features: multi-agent human-in-loop tool-use rag-knowledge-base persistent-memory iterative-loop artifact-versioning
Inputs: research-idea prior-paper existing-results
Outputs: paper code figures experiment-logs
Data sources: user-provided
Knowledge sources: arxiv literature mcp-servers
Limitations¶
- Ethics-and-responsible-use guidelines added 2026-04; the autonomous-default era leaves a legacy of misuse risk.
- No peer-reviewed publication describing the system at this scoring date.
- Heavy aspirational marketing ('Chat an Idea. Get a Paper.') sets a higher bar than the showcase examples meet.
- Single-organization maintenance (Aiming Lab); long-term governance unclear.
Related projects in this catalog¶
Also compared in¶
- ARIS Table 4 (footnote 1) (
yang2026aris) — Noted as 'very recently' built; no detailed comparison.
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning - Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools
schick2023toolformer - Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation
ji2023hallucination