`research-review`¶

Pack: ARIS skills

Category: review

Field: —

License: MIT

Updated: 2026-05-18

Stages: referee-simulation

↗ view SKILL.md on source · GitHub stars

Research Review via Codex MCP (xhigh reasoning)¶

Get a multi-round critical review of research work from an external LLM with maximum reasoning depth.

Constants¶

REVIEWER_MODEL = gpt-5.5 — Model used via Codex MCP. Must be an OpenAI model (e.g., gpt-5.5, o3, gpt-4o)
REVIEWER_BACKEND = codex — Default: Codex MCP (xhigh). Override with — reviewer: oracle-pro for GPT-5.4 Pro via Oracle MCP. See shared-references/reviewer-routing.md.

Context: $ARGUMENTS¶

Prerequisites¶

Codex MCP Server configured in Claude Code:

Bash

claude mcp add codex -s user -- codex mcp-server

This gives Claude Code access to mcp__codex__codex and mcp__codex__codex-reply tools

Workflow¶

Step 1: Gather Research Context¶

Before calling the external reviewer, compile a comprehensive briefing: 1. Read project narrative documents (e.g., STORY.md, README.md, paper drafts) 2. Read any memory/notes files for key findings and experiment history 3. Identify: core claims, methodology, key results, known weaknesses

Step 2: Initial Review (Round 1)¶

Send a detailed prompt with xhigh reasoning:

Text Only

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [Full research context + specific questions]
    Please act as a senior ML reviewer (NeurIPS/ICML level). Identify:
    1. Logical gaps or unjustified claims
    2. Missing experiments that would strengthen the story
    3. Narrative weaknesses
    4. Whether the contribution is sufficient for a top venue
    Please be brutally honest.

Step 3: Iterative Dialogue (Rounds 2-N)¶

Use mcp__codex__codex-reply with the returned threadId to continue the conversation:

For each round: 1. Respond to criticisms with evidence/counterarguments 2. Ask targeted follow-ups on the most actionable points 3. Request specific deliverables: experiment designs, paper outlines, claims matrices

Key follow-up patterns: - "If we reframe X as Y, does that change your assessment?" - "What's the minimum experiment to satisfy concern Z?" - "Please design the minimal additional experiment package (highest acceptance lift per GPU week)" - "Please write a mock NeurIPS/ICML review with scores" - "Give me a results-to-claims matrix for possible experimental outcomes"

Step 4: Convergence¶

Stop iterating when: - Both sides agree on the core claims and their evidence requirements - A concrete experiment plan is established - The narrative structure is settled

Step 5: Document Everything¶

Save the full interaction and conclusions to a review document in the project root: - Round-by-round summary of criticisms and responses - Final consensus on claims, narrative, and experiments - Claims matrix (what claims are allowed under each possible outcome) - Prioritized TODO list with estimated compute costs - Paper outline if discussed

Update project memory/notes with key review conclusions.

Key Rules¶

ALWAYS use config: {"model_reasoning_effort": "xhigh"} for reviews
Send comprehensive context in Round 1 — the external model cannot read your files
Be honest about weaknesses — hiding them leads to worse feedback
Push back on criticisms you disagree with, but accept valid ones
Focus on ACTIONABLE feedback — "what experiment would fix this?"
Document the threadId for potential future resumption
The review document should be self-contained (readable without the conversation)

Prompt Templates¶

For initial review:¶

"I'm going to present a complete ML research project for your critical review. Please act as a senior ML reviewer (NeurIPS/ICML level)..."

For experiment design:¶

"Please design the minimal additional experiment package that gives the highest acceptance lift per GPU week. Our compute: [describe]. Be very specific about configurations."

For paper structure:¶

"Please turn this into a concrete paper outline with section-by-section claims and figure plan."

For claims matrix:¶

"Please give me a results-to-claims matrix: what claim is allowed under each possible outcome of experiments X and Y?"

For mock review:¶

"Please write a mock NeurIPS review with: Summary, Strengths, Weaknesses, Questions for Authors, Score, Confidence, and What Would Move Toward Accept."

Review Tracing¶

After each mcp__codex__codex or mcp__codex__codex-reply reviewer call, save the trace following shared-references/review-tracing.md (Policy C — forensic; never silently skip). Use save_trace.sh (resolved per the chain in shared-references/integration-contract.md §2) or write files directly to .aris/traces/<skill>/<date>_run<NN>/. Respect the --- trace: parameter (default: full).

research-review¶