`auto-review-loop-llm`¶

Pack: ARIS skills

Category: review

Field: —

License: MIT

Updated: 2026-05-18

Stages: referee-simulation

↗ view SKILL.md on source · GitHub stars

Auto Review Loop (Generic LLM): Autonomous Research Improvement¶

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS¶

Constants¶

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)

LLM Configuration¶

This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.

Configuration via MCP Server (Recommended)¶

Add to ~/.claude/settings.json:

JSON

{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers¶

Provider	LLM_BASE_URL	LLM_MODEL
OpenAI	`https://api.openai.com/v1`	`gpt-4o`, `o3`
DeepSeek	`https://api.deepseek.com/v1`	`deepseek-chat`, `deepseek-reasoner`
MiniMax	`https://api.minimax.io/v1`	`MiniMax-M2.7`
Kimi (Moonshot)	`https://api.moonshot.cn/v1`	`moonshot-v1-8k`, `moonshot-v1-32k`
ZhiPu (GLM)	`https://open.bigmodel.cn/api/paas/v4`	`glm-4`, `glm-4-plus`
SiliconFlow	`https://api.siliconflow.cn/v1`	`Qwen/Qwen2.5-72B-Instruct`
阿里云百炼	`https://dashscope.aliyuncs.com/compatible-mode/v1`	`qwen-max`
零一万物	`https://api.lingyiwanwu.com/v1`	`yi-large`

API Call Method¶

Primary: MCP Tool

Text Only

mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."

Fallback: curl

Bash

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)¶

Persist state to review-stage/REVIEW_STATE.json after each round:

JSON

{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}

Write this file at the end of every Phase E (after documenting the round).

On completion, set "status": "completed".

Workflow¶

Initialization¶

Check review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)
Read project context and prior reviews
Initialize round counter

Loop (up to MAX_ROUNDS)¶

Phase A: Review¶

If MCP available:

Text Only

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

If MCP NOT available:

Bash

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment¶

CRITICAL: Save the FULL raw response verbatim. Then extract: - Score (numeric 1-10) - Verdict ("ready" / "almost" / "not ready") - Action items (ranked list of fixes)

STOP: If score >= 6 AND verdict contains "ready/almost"

Phase C: Implement Fixes¶

Priority: metric additions > reframing > new experiments

Phase D: Wait for Results¶

Monitor remote experiments

Phase E: Document Round¶

Append to review-stage/AUTO_REVIEW.md:

Markdown

### Round N (timestamp)

#### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]

#### Reviewer Raw Response

<details>
<summary>Click to expand full reviewer response</summary>

[Paste the COMPLETE raw response here — verbatim, unedited.]

</details>

#### Actions Taken
- [what was implemented/changed]

#### Results
- [experiment outcomes, if any]

#### Status
- [continuing to round N+1 / stopping]

Write review-stage/REVIEW_STATE.json with current state.

Termination¶

Set review-stage/REVIEW_STATE.json status to "completed"
Write final summary

Key Rules¶

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available

Prompt Template for Round 2+¶

Text Only

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

Output Protocols¶

Follow these shared protocols for all output files: - Output Versioning Protocol — write timestamped file first, then copy to fixed name - Output Manifest Protocol — log every output to MANIFEST.md - Output Language Protocol — respect the project's language setting

auto-review-loop-llm¶

Auto Review Loop (Generic LLM): Autonomous Research Improvement¶

Context: $ARGUMENTS¶

Constants¶

LLM Configuration¶

Configuration via MCP Server (Recommended)¶

Supported Providers¶

API Call Method¶

State Persistence (Compact Recovery)¶

Workflow¶

Initialization¶

Loop (up to MAX_ROUNDS)¶

Phase A: Review¶

Phase B: Parse Assessment¶

Phase C: Implement Fixes¶

Phase D: Wait for Results¶

Phase E: Document Round¶

Termination¶

Key Rules¶

Prompt Template for Round 2+¶

Output Protocols¶

`auto-review-loop-llm`¶