Are embeddings the same as keywords?

No. Embeddings are dense numerical vectors that capture meaning. Keywords are the literal strings in the text. They're complementary, not substitutes.

Why do I sometimes get irrelevant results from semantic search?

Usually two reasons: chunking is too coarse (the chunk mixes multiple topics, so the embedding averages out), or the query is too short ("why?" has no semantic signal). Longer, more specific queries plus tighter chunks fix most cases.

Does BrainTube let me force keyword search?

Yes — wrap a term in double quotes to bias toward exact match, like "useEffect" or "Naval Ravikant".

All topics

Retrieval

Semantic search vs keyword search

Why "vibes-based" search returns things keyword search misses — and where it still loses.

Updated April 2026

Keyword search matches strings; semantic search matches meaning. Keyword search will return a video only if the exact word you typed appears in the transcript. Semantic search compares the meaning of your query to the meaning of every chunk in the library using vector embeddings, so a search for "how to stop procrastinating" can surface a talk titled "beating the resistance" — even though none of those words overlap.

How semantic search actually works

Every chunk of text in your library is run through an embedding model (e.g. OpenAI text-embedding-3, Cohere embed-v3) that produces a vector — a list of ~1500 numbers representing the chunk's meaning. Your query gets the same treatment. The retriever returns the chunks whose vectors are closest to the query's vector (cosine similarity). Closeness in vector space ≈ closeness in meaning, learned from billions of training examples.

Where keyword still wins

Exact identifiers — product names, error codes, people, URLs, code snippets, legal citations. If you're searching for "GPT-5" or "useEffect" or "Form 1099-MISC," you want exact matches, not paraphrases. Keyword (or BM25) also handles negation and operators ("foo AND NOT bar") that semantic search struggles with.

The hybrid approach

Production retrievers blend both. BrainTube runs BM25 and a dense vector index in parallel, fuses the results with reciprocal rank fusion (RRF), then re-ranks the top ~50 with a cross-encoder model. The result: paraphrase queries surface conceptually related content, and exact-name queries still return the right hit at rank 1.

Why retriever quality matters more than model quality

A frontier LLM with a bad retriever hallucinates because it's given wrong or no context. A modest LLM with a great retriever cites accurately because it's grounded. BrainTube spends compute on the retriever — chunking, embeddings, hybrid scoring, re-ranking — so any LLM you point at it gets sharper answers.

Frequently asked

Are embeddings the same as keywords?: No. Embeddings are dense numerical vectors that capture meaning. Keywords are the literal strings in the text. They're complementary, not substitutes.
Why do I sometimes get irrelevant results from semantic search?: Usually two reasons: chunking is too coarse (the chunk mixes multiple topics, so the embedding averages out), or the query is too short ("why?" has no semantic signal). Longer, more specific queries plus tighter chunks fix most cases.
Does BrainTube let me force keyword search?: Yes — wrap a term in double quotes to bias toward exact match, like "useEffect" or "Naval Ravikant".

Try BrainTube on your own corpus

Free tier, no card. Export anytime.

Start free