Semantic search vs keyword search
Why "vibes-based" search returns things keyword search misses — and where it still loses.
Updated April 2026
Keyword search matches strings; semantic search matches meaning. Keyword search will return a video only if the exact word you typed appears in the transcript. Semantic search compares the meaning of your query to the meaning of every chunk in the library using vector embeddings, so a search for "how to stop procrastinating" can surface a talk titled "beating the resistance" — even though none of those words overlap.
How semantic search actually works
Every chunk of text in your library is run through an embedding model (e.g. OpenAI text-embedding-3, Cohere embed-v3) that produces a vector — a list of ~1500 numbers representing the chunk's meaning. Your query gets the same treatment. The retriever returns the chunks whose vectors are closest to the query's vector (cosine similarity). Closeness in vector space ≈ closeness in meaning, learned from billions of training examples.
Where keyword still wins
Exact identifiers — product names, error codes, people, URLs, code snippets, legal citations. If you're searching for "GPT-5" or "useEffect" or "Form 1099-MISC," you want exact matches, not paraphrases. Keyword (or BM25) also handles negation and operators ("foo AND NOT bar") that semantic search struggles with.
The hybrid approach
Production retrievers blend both. BrainTube runs BM25 and a dense vector index in parallel, fuses the results with reciprocal rank fusion (RRF), then re-ranks the top ~50 with a cross-encoder model. The result: paraphrase queries surface conceptually related content, and exact-name queries still return the right hit at rank 1.
Why retriever quality matters more than model quality
A frontier LLM with a bad retriever hallucinates because it's given wrong or no context. A modest LLM with a great retriever cites accurately because it's grounded. BrainTube spends compute on the retriever — chunking, embeddings, hybrid scoring, re-ranking — so any LLM you point at it gets sharper answers.
Frequently asked
Try BrainTube on your own corpus
Free tier, no card. Export anytime.
More to read
- What is MCP (Model Context Protocol)? — The open protocol that lets any AI client read your tools and data — without bespoke integrations.
- A second brain for operators — What changes when your notes, videos, and PDFs are queryable from inside the tools you already use.
- RAG in 90 seconds — Retrieval-Augmented Generation, demystified for non-engineers.
