Do I need a vector database to use RAG?

Usually yes — Pinecone, Qdrant, pgvector, Weaviate, or similar. BrainTube runs its own managed vector index; you don't touch it.

Is RAG the same as MCP?

No. RAG is a pattern for grounding LLM answers in retrieved documents. MCP is a protocol for connecting AI clients to tools. BrainTube uses RAG internally and exposes the results via MCP.

Yes, but much less than vanilla LLM use. With good retrieval and a prompt that says "only answer using the provided excerpts," hallucination rates drop dramatically.

All topics

AI plumbing

RAG in 90 seconds

Retrieval-Augmented Generation, demystified for non-engineers.

Updated April 2026

RAG (Retrieval-Augmented Generation) is a pattern where an LLM first retrieves relevant documents from an external store, then generates its answer using those documents as context. It's how LLMs answer questions about your private data — your saved videos, your company wiki, your codebase — without being retrained on it.

The three steps

1) Retrieve: a search system finds the chunks most relevant to the query. 2) Augment: those chunks get inserted into the prompt under a header like 'Here are excerpts from the user's library.' 3) Generate: the LLM writes the answer, ideally citing which chunk it used.

Why RAG beats fine-tuning for changing data

Fine-tuning bakes knowledge into the model's weights — expensive, slow to update, and the model can still hallucinate around the edges. RAG keeps your knowledge in a database you can update any time, and the model sees the latest version on every query. For anything that changes weekly (your notes, company docs, the news), RAG wins.

What makes a RAG system actually work

Smart chunking (not too big, not too small, splits on semantic boundaries), good embeddings, a hybrid retriever (BM25 + vectors), a re-ranker on top, and citation-aware generation. Most demo RAG apps skip three of these and ship something that hallucinates. BrainTube treats retrieval as the product, not a one-line vectorStore.search(query) call.

Where RAG fails

Multi-hop reasoning across many documents, math, and questions that require synthesizing many small facts. For those, RAG needs help from agentic patterns (decompose the query, retrieve per sub-question, then synthesize) or longer context windows.

Frequently asked

Do I need a vector database to use RAG?: Usually yes — Pinecone, Qdrant, pgvector, Weaviate, or similar. BrainTube runs its own managed vector index; you don't touch it.
Is RAG the same as MCP?: No. RAG is a pattern for grounding LLM answers in retrieved documents. MCP is a protocol for connecting AI clients to tools. BrainTube uses RAG internally and exposes the results via MCP.
Can RAG hallucinate?: Yes, but much less than vanilla LLM use. With good retrieval and a prompt that says "only answer using the provided excerpts," hallucination rates drop dramatically.

Try BrainTube on your own corpus

Free tier, no card. Export anytime.

Start free