Justin Lin | Portfolio

Paper Notes: Retrieval-Augmented Generation (RAG)

Apr 19, 2026

TL;DR

Pure LLMs store knowledge in parameters but struggle with factual accuracy and updates.
RAG combines parametric models with external retrieval memory.
Key takeaway: hybrid parametric + non-parametric memory improves factuality and flexibility.

Bibliographic Snapshot

Field	Detail
Citation	`Lewis et al., NeurIPS 2020`
Keywords	retrieval, hybrid memory, QA
Dataset / Benchmarks	NQ, TriviaQA, WebQuestions
Code / Repo	HuggingFace RAG

Problem Statement

LLMs encode knowledge in parameters but cannot update or verify it easily. This leads to hallucination and poor performance on knowledge-intensive tasks. The goal is to augment LLMs with external, updatable memory.

Core Idea

Combine:
- Parametric memory (seq2seq model like BART)
- Non-parametric memory (retrieved documents)
Pipeline:
- Retrieve top-K docs via DPR
- Generate conditioned on docs
Two variants:
- RAG-Sequence: same doc for whole output
- RAG-Token: different docs per token
Training:
- End-to-end with latent document marginalization

Visual / Diagram Notes

Figure 1 (page 2):
- Shows retriever + generator pipeline
- Retrieval treated as latent variable
Key insight: marginalize over documents rather than pick one

Key Results

SOTA on open-domain QA tasks
More factual and diverse generation than BART
Can update knowledge by swapping document index
Limitation: retrieval quality is bottleneck

Personal Analysis

What worked:

Clean probabilistic formulation
First scalable hybrid memory architecture
Strong empirical results

What puzzled you:

Fixed top-K retrieval is naive
No mechanism to decide when to retrieve

Connections & Related Work

Foundation of all modern RAG systems
Extended by SELF-RAG (adaptive retrieval)
Used in production LLM systems

Implementation Sketch

Retriever: DPR (BERT-based)
Generator: BART
Pipeline:
1. Encode query
2. Retrieve top-K docs
3. Concatenate input + docs
4. Generate output

Open Questions / Next Actions

How to improve retrieval relevance?
Can retrieval be dynamic?
How to reduce latency?

Glossary

Parametric memory: model weights
Non-parametric memory: external DB
DPR: Dense Passage Retriever

Back to blog