Paper Notes: Retrieval-Augmented Generation (RAG)
Apr 19, 2026
TL;DR
- Pure LLMs store knowledge in parameters but struggle with factual accuracy and updates.
- RAG combines parametric models with external retrieval memory.
- Key takeaway: hybrid parametric + non-parametric memory improves factuality and flexibility.
Bibliographic Snapshot
| Field | Detail |
|---|---|
| Citation | Lewis et al., NeurIPS 2020 |
| Keywords | retrieval, hybrid memory, QA |
| Dataset / Benchmarks | NQ, TriviaQA, WebQuestions |
| Code / Repo | HuggingFace RAG |
Problem Statement
LLMs encode knowledge in parameters but cannot update or verify it easily. This leads to hallucination and poor performance on knowledge-intensive tasks. The goal is to augment LLMs with external, updatable memory.
Core Idea
- Combine:
- Parametric memory (seq2seq model like BART)
- Non-parametric memory (retrieved documents)
- Pipeline:
- Retrieve top-K docs via DPR
- Generate conditioned on docs
- Two variants:
- RAG-Sequence: same doc for whole output
- RAG-Token: different docs per token
- Training:
- End-to-end with latent document marginalization
Visual / Diagram Notes
- Figure 1 (page 2):
- Shows retriever + generator pipeline
- Retrieval treated as latent variable
- Key insight: marginalize over documents rather than pick one
Key Results
- SOTA on open-domain QA tasks
- More factual and diverse generation than BART
- Can update knowledge by swapping document index
- Limitation: retrieval quality is bottleneck
Personal Analysis
What worked:
- Clean probabilistic formulation
- First scalable hybrid memory architecture
- Strong empirical results
What puzzled you:
- Fixed top-K retrieval is naive
- No mechanism to decide when to retrieve
Connections & Related Work
- Foundation of all modern RAG systems
- Extended by SELF-RAG (adaptive retrieval)
- Used in production LLM systems
Implementation Sketch
- Retriever: DPR (BERT-based)
- Generator: BART
- Pipeline:
- Encode query
- Retrieve top-K docs
- Concatenate input + docs
- Generate output
Open Questions / Next Actions
- How to improve retrieval relevance?
- Can retrieval be dynamic?
- How to reduce latency?
Glossary
- Parametric memory: model weights
- Non-parametric memory: external DB
- DPR: Dense Passage Retriever