AutoResearchClaw¶

external · status: active · focus: end-to-end · discipline: general · started: 2026

Project page: https://github.com/aiming-lab/AutoResearchClaw

Source: projects/landscape/autoresearchclaw.yml

Positioning¶

An autonomous research pipeline taking a chat-level idea to a full paper via ACP-compatible agent back-ends (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI). Ships v0.4.0+ with a Human-in-the-Loop Co-Pilot system offering 6 intervention modes (full-auto / gate-only / checkpoint / step-by-step / co-pilot / custom), 19 pre-loaded skills, anti-fabrication VerifiedRegistry, and CLI commands for attaching to / approving / redirecting in-flight pipelines.

Distinctive contribution¶

Among the largest user communities in the catalog (12.3k stars, multi-language localizations covering 9 languages + Discord community). The HITL co-pilot system with 6 explicit intervention modes is the most differentiated human-oversight surface among the end-to-end pipelines — and the anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop) is a named runtime-assurance component, not an evaluation afterthought.

Evaluation scores¶

Dimension	Score (0–3)	Note
Lifecycle coverage	3	Nine stages including referee simulation; explicit dissemination support via OpenClaw integration.
Autonomy level	2	User-selectable: full-auto through co-pilot. After v0.4.0 HITL launch, the design centers on partial autonomy rather than fully autonomous default.
Architectural transparency	2	Open MIT; v0.4.0 HITL guide documents the 6 intervention modes; some agent internals require source dive.
Inputs supported	3	Multiple inputs (idea, prior paper, existing results); 5+ ACP-compatible back-ends; messaging-platform integration via OpenClaw bridge.
Outputs / reproducibility	2	Per-project workspaces; experiment diagnosis & repair loop; 2699 tests passing — but not bitwise-deterministic by design.
Internal evaluation	2	Anti-fabrication VerifiedRegistry, paper showcase across 8 domains, active testing program; no published peer-reviewed evaluation paper at this scoring date.
Openness	3	MIT-licensed; multi-language documentation; active Discord community; community-contributed skill ecosystem.
Maturity / traction	3	12.3k+ stars (largest in the catalog); regular releases (v0.4.x in 2026-04); broad community adoption signals.
Cross-family policy	1	Multiple back-ends (Claude/Codex/Copilot/Gemini/Kimi CLI) make cross-family configurations easy, but no required cross-family review policy in the architecture.
Runtime assurance	3	Anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop), SmartPause (confidence-driven dynamic intervention), ALHF intervention learning, cost budget guardrails.
Cross-platform portability	3	5+ CLI back-ends (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) + 4 messaging platforms (Discord, Telegram, Lark, WeChat) via OpenClaw bridge.

Scored on 2026-05-18. See the evaluation rubric.

Tags¶

Pipeline stages: hypothesis-generation literature-discovery literature-synthesis research-design data-analysis code-generation paper-drafting revision-editing referee-simulation

Architectural features: multi-agent human-in-loop tool-use rag-knowledge-base persistent-memory iterative-loop artifact-versioning

Inputs: research-idea prior-paper existing-results

Outputs: paper code figures experiment-logs

Data sources: user-provided

Knowledge sources: arxiv literature mcp-servers

Limitations¶

Ethics-and-responsible-use guidelines added 2026-04; the autonomous-default era leaves a legacy of misuse risk.
No peer-reviewed publication describing the system at this scoring date.
Heavy aspirational marketing ('Chat an Idea. Get a Paper.') sets a higher bar than the showcase examples meet.
Single-organization maintenance (Aiming Lab); long-term governance unclear.

Also compared in¶

ARIS Table 4 (footnote 1) (yang2026aris) — Noted as 'very recently' built; no detailed comparison.

Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools wu2025agenticreasoning
Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools schick2023toolformer
Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation ji2023hallucination