E2ER — End-to-End Research¶

owned · status: active · focus: end-to-end · discipline: economics · started: 2025

Project page: https://github.com/bhanneke/E2ER-project

Source: projects/e2er.yml

Positioning¶

E2ER is a strategist-driven agentic research pipeline that takes a research idea (human- or agent-supplied) and carries it through literature synthesis, identification, data acquisition, analysis, and paper drafting. It targets the full inputs → knowledge production → outputs arc of the RISE diagram, with explicit data and knowledge side-inputs.

Distinctive contribution¶

Strategist-orchestrated multi-agent design with persona-rich review loops; emphasizes durable artifact production over chat output, and treats methodological skills (econometrics, replication, referee simulation) as first-class reusable modules.

Evaluation scores¶

Dimension	Score (0–3)	Note
Lifecycle coverage	3	Covers ideation through referee simulation — 12 of 14 canonical stages.
Autonomy level	2	Supervised agent — human sets up the task and reviews final artifacts; intermediate steps are autonomous.
Architectural transparency	2	Architecture, prompts, and skills published on GitHub; some orchestration internals still evolving.
Inputs supported	3	Accepts ideas, RQs, prior papers; integrates literature corpora and live data sources.
Outputs / reproducibility	2	Versioned artifacts persisted; full end-to-end reproducibility from inputs still in progress.
Internal evaluation	1	Reviewer-simulation loops provide internal evaluation; no external benchmark or peer-reviewed publication yet.
Openness	2	Public repository under permissive license; some examples reproducible without proprietary credentials.
Maturity / traction	1	Active research prototype; single-team use as of 2026-05.
Cross-family policy	0	Single-LLM-family design (Claude Code); skill-based critics within the same family — no cross-family policy.
Runtime assurance	2	Reviewer-simulation skills + skills-loader + persistent memory + scheme-aggregated weighted consensus provide moderate runtime gating; no published claim-audit harness.
Cross-platform portability	1	Docker stack; Claude Code as primary back-end; some skills reusable elsewhere but no native multi-IDE adapters.

Scored on 2026-05-18. See the evaluation rubric.

Tags¶

Pipeline stages: rq-formulation hypothesis-generation literature-discovery literature-synthesis research-design data-acquisition data-analysis formal-modeling code-generation paper-drafting revision-editing referee-simulation

Architectural features: multi-agent human-in-loop tool-use rag-knowledge-base persistent-memory dag-orchestration iterative-loop artifact-versioning

Inputs: human-idea agentic-idea research-question prior-paper

Outputs: paper-draft figures tables code referee-reports replication-package

Data sources: fred yfinance ssrn openalex

Knowledge sources: arxiv ssrn openalex semantic-scholar

Limitations¶

End-to-end reproducibility from inputs not yet demonstrated on a public test case.
Domain coverage currently focused on economics/finance; portability to other fields untested.
External evaluation (peer review, third-party replication) pending.

Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools wu2025agenticreasoning
Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools schick2023toolformer
Filimonovic, D. et al. (2025). Can GenAI Improve Academic Performance? Evidence from the Social and Behavioral Sciences filimonovic2025genai
cunningham2025claudecode (BibTeX)
eberhardt2025claudecode (BibTeX)