Anatomy of a RISE pipeline¶
The landing-page diagram is a coarse model. This page expands each block into its operational components and points to the projects in the catalog that realize them concretely.
Inputs¶
A RISE system is specified in part by what it accepts as a starting
point. Catalog entries declare an inputs: field; common values:
- Human idea — an underspecified prompt from a researcher ("something about momentum in crypto markets").
- Agentic idea — an idea produced by an upstream ideation loop; characteristic of systems that chain ideation and execution (e.g., the Sakana AI Scientist lineage).
- Research question — a fully specified, identification-aware RQ with a stated population, treatment, outcome, and design.
- Replication target — a published paper whose results the pipeline
must reproduce (
social-science-replicability). - Submitted paper — for review-focused systems
(
ape,reviewer,marg).
The breadth of supported inputs is one of the eleven evaluation dimensions; an input-narrow system can still be excellent at what it does.
Knowledge Production (the core)¶
The agentic pipeline itself. Stages defined in
projects/VOCABULARY.md:
| Stage | What happens | Reference systems |
|---|---|---|
rq-formulation |
Sharpen an underspecified prompt into a research question. | e2er, agent-laboratory |
hypothesis-generation |
Generate testable hypotheses. | sakana-ai-scientist, robin, agent-laboratory |
literature-discovery |
Find relevant papers. | storm, paper-qa, open-scholar, gpt-researcher |
literature-synthesis |
Summarize, integrate, identify gaps. | storm, paper-qa, open-scholar |
research-design |
Choose method, identification, data plan. | e2er, agent-laboratory, robin |
data-acquisition |
Fetch, scrape, request data. | e2er, social-science-replicability |
data-analysis |
Clean, explore, estimate. | e2er, agent-laboratory, social-science-replicability |
formal-modeling |
Theoretical / mathematical models. | e2er |
code-generation |
Produce analysis or replication code. | agent-laboratory, sakana-ai-scientist |
paper-drafting |
First-draft paper sections. | zeropaper, agent-laboratory, sakana-ai-scientist |
revision-editing |
Refine prose, structure, argumentation. | refine-ink, coarse-ink |
referee-simulation |
Pre-submission peer review. | ape, reviewer, marg |
replication |
Reproduce a published paper's results. | social-science-replicability |
dissemination |
Format for journals/preprints. | (none yet in catalog) |
Implementations differ on three orthogonal axes:
- Autonomy — copilot, supervised agent, autonomous, or society-of-agents. The evaluation rubric scores this 0–3.
- Topology — directed acyclic graph, iterative loop, debate /
consensus, or hybrid. Tagged in
architectural_features. - Stage coverage — single-stage tool ↔ end-to-end pipeline.
Tagged via
pipeline_stagesand headlinefocus.
Data layer (left input)¶
The empirical material the pipeline operates on. Sources are heterogeneous and per-domain: market data feeds, administrative records, biological-sequence databases, user-supplied PDFs.
See Data layer for the typology and access-pattern discussion.
Knowledge layer (right input)¶
Prior scholarship: literature, citations, theory, reusable methods.
The catalog's literature-focused projects (paper-qa,
open-scholar,
storm) live primarily in this layer.
See Knowledge layer.
Outputs¶
A RISE system is characterized by what it produces durably.
Catalog entries declare an outputs: field; common values:
- Artifacts — code, figures, tables, prompts, intermediate notes.
- Papers / preprints — the most public output type.
- Datasets — curated, often a by-product of the pipeline.
- Replication reports — structured comparison of pipeline output vs. a published target.
- Referee reports — structured peer reviews.
Output reproducibility is scored on the evaluation rubric: a system that drops outputs to disk in a versioned, re-runnable form scores higher than one whose terminal artifact is ephemeral.
What is not in this diagram (yet)¶
Three blocks are conspicuously absent and likely need explicit treatment in a future revision:
- Submission and venue routing. Where do papers go, and how is that decision made?
- Ethics and IRB-equivalent gates. For studies involving human subjects, animal subjects, or sensitive data.
- Post-publication revision and retraction. How an output is maintained over time.
These belong in the diagram but no project in the current catalog implements them. Pull requests welcome.