OpenScholar (AI2)¶
external · status: active · focus: literature · discipline: general · started: 2024
Project page: https://github.com/AkariAsai/OpenScholar
Source: projects/landscape/open-scholar.yml
Positioning¶
A retrieval-augmented LM designed to answer scientific queries by searching the literature and generating responses grounded in sources. Releases include training code, an 8B fine-tuned Llama checkpoint, an offline retrieval index, and the ScholarQABench evaluation suite. Sits in the literature block of the RISE diagram with strong evaluation tooling.
Distinctive contribution¶
Pairs the inference system with two purpose-built evaluation artifacts — ScholarQABench (automatic) and OpenScholar_ExpertEval (human) — addressing the under-developed evaluation axis of scholarly-synthesis systems. Open weights make it usable as a research baseline.
Evaluation scores¶
| Dimension | Score (0–3) | Note |
|---|---|---|
| Lifecycle coverage | 0 | Two stages (discovery + synthesis); a literature-QA building block. |
| Autonomy level | 2 | Supervised: user submits a query, system returns a cited answer. |
| Architectural transparency | 3 | Open under Apache-2.0; arXiv:2411.14199 documents method; training + retrieval code published. |
| Inputs supported | 2 | Scientific queries with optional retrieval-result inputs; supports both open and proprietary LMs. |
| Outputs / reproducibility | 2 | Released retrieval results, model checkpoints, and inference scripts make pipeline runs reproducible. |
| Internal evaluation | 3 | ScholarQABench + expert evaluation interfaces; both quantitative and human evaluation reported. |
| Openness | 3 | Apache-2.0; open weights for Llama-3.1_OpenScholar-8B; data and benchmark publicly released. |
| Maturity / traction | 2 | 1.5k+ stars; cited as a baseline; demo at open-scholar.allen.ai; backed by AI2. |
| Cross-family policy | 1 | 8B open-weight model + optional commercial LLMs; cross-family configurable. |
| Runtime assurance | 1 | ScholarQABench evaluation set + retrieval verification; runtime gating is light. |
| Cross-platform portability | 1 | HuggingFace + Semantic Scholar API + You.com; not multi-IDE. |
Scored on 2026-05-18. See the evaluation rubric.
Tags¶
Pipeline stages: literature-discovery literature-synthesis
Architectural features: rag-knowledge-base tool-use iterative-loop
Inputs: research-question
Outputs: cited-response citations
Data sources: semantic-scholar-api you-search
Knowledge sources: semantic-scholar web-search
Limitations¶
- Single-stage focus (literature synthesis), not an end-to-end pipeline.
- Quality depends on retrieval coverage; Semantic Scholar API required.
- Last commit slightly older than the most-active projects in this catalog.
Related projects in this catalog¶
Papers describing this project¶
- OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs — Asai, A., He, J., Shao, R., Shi, W., Singh, A., Chang, J. C., et al. (2024). arXiv. arXiv:2411.14199
Also compared in¶
- Agentic AI for Scientific Discovery: A Survey (
gridach2025agenticsurvey) — Covered as a retrieval-augmented LM for scholarly literature.
Related references (literature catalog)¶
- Wu, J. et al. (2025). Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
wu2025agenticreasoning - Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation
ji2023hallucination