PaperQA2 (FutureHouse)¶
external · status: active · focus: literature · discipline: general · started: 2023
Project page: https://github.com/Future-House/paper-qa
Source: projects/landscape/paper-qa.yml
Positioning¶
A high-accuracy retrieval-augmented generation package focused on scientific PDFs (and Office docs, source code). PaperQA2 combines agentic document retrieval with metadata enrichment (incl. retraction checks) to answer questions over scientific corpora with citations. Sits in the knowledge-layer block of the RISE diagram as a building block, not a full pipeline.
Distinctive contribution¶
Reports superhuman performance on scientific QA, summarization, and contradiction-detection tasks in the accompanying 2024 paper, with particular attention to citation accuracy and retraction awareness. Distributed as a battle-tested library that other RISE projects can embed.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 0 | Two adjacent stages (discovery + synthesis); a building-block, not a pipeline. |
| Autonomy level | 2 | Supervised: user supplies corpus and question, agent retrieves and answers. |
| Architectural transparency | 3 | Open under Apache-2.0; PaperQA2 paper documents algorithm; PyPI package with rich settings cheatsheet. |
| Inputs supported | 2 | Multiple input formats (PDFs, Office, code); embedding + LLM back-ends configurable. |
| Outputs / reproducibility | 2 | Caching + index reuse make outputs reproducible given fixed inputs and model. |
| Internal evaluation | 3 | Published 2024 paper with comparative benchmarks; widely cited as a reference RAG baseline. |
| Openness | 3 | Apache-2.0; PyPI; documented API; permissive license; reproducibility scripts in repo. |
| Maturity / traction | 3 | 8.5k+ stars, production-grade releases, embedded in downstream FutureHouse systems. |
| Cross-family policy | 1 | Multiple model back-ends; cross-family setups supported via LiteLLM. |
| Runtime assurance | 3 | Retraction-Watch integration + citation grounding + metadata enrichment + multi-pass RAG = heavy runtime assurance for citation faithfulness specifically. |
| Cross-platform portability | 3 | Pip-installable, multiple embedding back-ends, multiple LLM back-ends via LiteLLM, embeddable in other pipelines. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: literature-discovery literature-synthesis
Architectural features: multi-agent tool-use rag-knowledge-base artifact-versioning
Inputs: question pdf-corpus
Outputs: answer citations summary
Data sources: user-pdfs office-documents source-code
Knowledge sources: semantic-scholar crossref retraction-watch
Limitations¶
- Building block (literature Q&A) — not an end-to-end research pipeline.
- Quality depends on PDF parsing and embedding configuration.
- Best results require access to commercial LLMs.
Related projects in this catalog¶
Papers describing this project¶
- PaperQA: Retrieval-Augmented Generative Agent for Scientific Research — Lála, J., O'Donoghue, O., Shtedritski, A., Cox, S., Rodriques, S. G., White, A. D. (2023). arXiv (PaperQA1). arXiv:2312.07559
- Language agents achieve superhuman synthesis of scientific knowledge — Skarlinski, M. D., Cox, S., Laurent, J. M., Braza, J. D., Hinks, M., Hammerling, M. J., et al. (2024). arXiv (PaperQA2). arXiv:2409.13740
Also compared in¶
- Agentic AI for Scientific Discovery: A Survey (
gridach2025agenticsurvey) — Covered as a high-accuracy scientific-document RAG system.
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning - Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation
ji2023hallucination