Skip to content

Anatomy of a RISE pipeline

The landing-page diagram is a coarse model. This page expands each block into its operational components and points to the projects in the catalog that realize them concretely.

Anatomy of a RISE pipeline

Inputs

A RISE system is specified in part by what it accepts as a starting point. Catalog entries declare an inputs: field; common values:

  • Human idea — an underspecified prompt from a researcher ("something about momentum in crypto markets").
  • Agentic idea — an idea produced by an upstream ideation loop; characteristic of systems that chain ideation and execution (e.g., the Sakana AI Scientist lineage).
  • Research question — a fully specified, identification-aware RQ with a stated population, treatment, outcome, and design.
  • Replication target — a published paper whose results the pipeline must reproduce (social-science-replicability).
  • Submitted paper — for review-focused systems (ape, reviewer, marg).

The breadth of supported inputs is one of the eleven evaluation dimensions; an input-narrow system can still be excellent at what it does.

Knowledge Production (the core)

The agentic pipeline itself. Stages defined in projects/VOCABULARY.md:

Stage What happens Reference systems
rq-formulation Sharpen an underspecified prompt into a research question. e2er, agent-laboratory
hypothesis-generation Generate testable hypotheses. sakana-ai-scientist, robin, agent-laboratory
literature-discovery Find relevant papers. storm, paper-qa, open-scholar, gpt-researcher
literature-synthesis Summarize, integrate, identify gaps. storm, paper-qa, open-scholar
research-design Choose method, identification, data plan. e2er, agent-laboratory, robin
data-acquisition Fetch, scrape, request data. e2er, social-science-replicability
data-analysis Clean, explore, estimate. e2er, agent-laboratory, social-science-replicability
formal-modeling Theoretical / mathematical models. e2er
code-generation Produce analysis or replication code. agent-laboratory, sakana-ai-scientist
paper-drafting First-draft paper sections. zeropaper, agent-laboratory, sakana-ai-scientist
revision-editing Refine prose, structure, argumentation. refine-ink, coarse-ink
referee-simulation Pre-submission peer review. ape, reviewer, marg
replication Reproduce a published paper's results. social-science-replicability
dissemination Format for journals/preprints. (none yet in catalog)

Implementations differ on three orthogonal axes:

  • Autonomy — copilot, supervised agent, autonomous, or society-of-agents. The evaluation rubric scores this 0–3.
  • Topology — directed acyclic graph, iterative loop, debate / consensus, or hybrid. Tagged in architectural_features.
  • Stage coverage — single-stage tool ↔ end-to-end pipeline. Tagged via pipeline_stages and headline focus.

Data layer (left input)

The empirical material the pipeline operates on. Sources are heterogeneous and per-domain: market data feeds, administrative records, biological-sequence databases, user-supplied PDFs.

See Data layer for the typology and access-pattern discussion.

Knowledge layer (right input)

Prior scholarship: literature, citations, theory, reusable methods. The catalog's literature-focused projects (paper-qa, open-scholar, storm) live primarily in this layer.

See Knowledge layer.

Outputs

A RISE system is characterized by what it produces durably. Catalog entries declare an outputs: field; common values:

  • Artifacts — code, figures, tables, prompts, intermediate notes.
  • Papers / preprints — the most public output type.
  • Datasets — curated, often a by-product of the pipeline.
  • Replication reports — structured comparison of pipeline output vs. a published target.
  • Referee reports — structured peer reviews.

Output reproducibility is scored on the evaluation rubric: a system that drops outputs to disk in a versioned, re-runnable form scores higher than one whose terminal artifact is ephemeral.

What is not in this diagram (yet)

Three blocks are conspicuously absent and likely need explicit treatment in a future revision:

  • Submission and venue routing. Where do papers go, and how is that decision made?
  • Ethics and IRB-equivalent gates. For studies involving human subjects, animal subjects, or sensitive data.
  • Post-publication revision and retraction. How an output is maintained over time.

These belong in the diagram but no project in the current catalog implements them. Pull requests welcome.