Engineering

Hybrid search: why semantic alone isn't enough

By InternalWiki Team · 25 February 2026 · 7 min read

Semantic search was supposed to fix enterprise search. Instead of matching keywords, it understands meaning. Search for "remote work policy" and it finds the document titled "Work From Home Guidelines." Search for "how to submit expenses" and it finds the finance process doc even if it never uses the word "submit."

In practice, semantic search alone isn't enough. It's good at understanding intent. It's bad at catching specifics.

The problem with pure semantic search

Ask a semantic search system: "What's our policy on remote work in Portugal?" The embedding vector captures the concept of remote work policies. It will surface documents about remote work. But "Portugal" — a specific named entity — gets diluted in the embedding. The search might return the general remote work policy (high semantic similarity) and miss the specific Portugal addendum (lower semantic similarity but containing the exact information needed).

This is a well-known limitation of dense vector embeddings. They encode meaning at the expense of specificity. Two sentences can be semantically similar but refer to completely different entities. "The Berkeley Square lease expires in 2027" and "The Manchester Square lease expires in 2028" have nearly identical embeddings despite describing different properties.

A worked example

Query: “What's our policy on remote work in Portugal?” Here's what each approach returns.

Pure semantic

The embedding captures “remote work policy.” Results: (1) Remote Work Guidelines.docx — 0.89 similarity. (2) WFH Best Practices.pptx — 0.84. (3) Flexible Working Policy v2.docx — 0.81. (4) HR Policy Index — 0.78. None of these mention Portugal. The semantic model understands the concept perfectly but loses the specific country name in the embedding space. If the answer is in a “International Arrangements Addendum” that scores 0.62, it won't surface in the top results.

Pure keyword

Full-text search for “Portugal.” Results: (1) Client List — Portugal Office.xlsx — mentions “Portugal” 12 times, about clients, not policy. (2) Travel Expenses — Lisbon Trip.pdf — mentions Portugal but it's an expense report. (3) International Arrangements Addendum.docx — mentions “Portugal” once, in exactly the right context. The keyword search finds the entity but can't rank by relevance to the actual question. The right document is in third place, behind noise.

Hybrid

Both searches run in parallel. The semantic search surfaces the Remote Work Guidelines. The keyword search surfaces the International Arrangements Addendum. The merge step identifies that the Addendum scores well on keyword match and has reasonable semantic similarity (0.62). It ranks above the travel expense report (high keyword, low semantic) and below the main policy doc (high semantic, no keyword match). Result: the top three are exactly right — the main policy, the Portugal addendum, and the flexible working policy. The answer is complete and accurate.

The problem with pure keyword search

Keyword search has the opposite problem. It catches "Portugal" and "Berkeley Square" perfectly. But it doesn't understand that "WFH guidelines" is the same thing as "remote work policy." It doesn't know that "how do I get reimbursed" relates to "expense submission process." You need to guess the exact terms the document author used.

The hybrid approach

InternalWiki runs both searches in parallel against the same corpus, then merges and re-ranks the results:

// Simplified hybrid merge with Reciprocal Rank Fusion
function hybridSearch(query: string, userId: string) {
  const [semanticResults, keywordResults] = await Promise.all([
    vectorSearch(query, userId, { limit: 20 }),
    fullTextSearch(query, userId, { limit: 20 }),
  ]);

  // Reciprocal Rank Fusion (k=60)
  const scores = new Map<string, number>();
  for (const [rank, result] of semanticResults.entries()) {
    const id = result.chunkId;
    scores.set(id, (scores.get(id) ?? 0) + 1 / (60 + rank));
  }
  for (const [rank, result] of keywordResults.entries()) {
    const id = result.chunkId;
    scores.set(id, (scores.get(id) ?? 0) + 1 / (60 + rank));
  }

  return Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .slice(0, 10);
}

The semantic search finds the remote work policy. The keyword search finds the Portugal addendum. The merge puts both in the results. The re-ranking ensures the most relevant chunks — those that score well on both meaning and specificity — surface first.

The similarity threshold

We use a similarity threshold of 0.3 for the semantic search. This is deliberately lower than the 0.7–0.8 thresholds common in consumer applications. Enterprise document language is often formal and repetitive — multiple policies use similar phrasing, employment contracts share template language, compliance documents borrow standard regulatory text. A 0.7 threshold would miss relevant documents that happen to use slightly different terminology from the query.

The risk of a low threshold is more noise — more irrelevant chunks in the initial retrieval set. But the re-ranking step handles precision. Reciprocal Rank Fusion naturally deprioritises documents that only appear in one search and score poorly in the other. We'd rather cast a wide net and rank well than set a narrow threshold and miss the answer entirely.

// Cosine similarity between query and document embeddings
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

// "remote work Portugal" vs "Remote Work Guidelines"
cosineSimilarity(queryEmb, docEmb) // → 0.89 (high intent match)

// "remote work Portugal" vs "International Arrangements"
cosineSimilarity(queryEmb, addendumEmb) // → 0.62 (moderate)

// Keyword search rescues the addendum by matching "Portugal"

Re-ranking and the final result

Reciprocal Rank Fusion works by assigning each result a score based on its rank in each individual search: 1/(k+rank), where k=60 is a smoothing constant. Documents that appear in both searches receive a combined score — the sum of their per-search scores. This naturally boosts documents that perform well across both approaches.

The Portugal addendum scores 1/(60+2) from keyword search (rank 3) and 1/(60+7) from semantic search (rank 8). Combined: 0.0301. The general remote work policy scores 1/(60+1) from semantic (rank 2) but 0 from keyword (not in results). Combined: 0.0164. After fusion, the addendum outranks the general policy — which is the correct order for the specific question asked.

Results

In our internal benchmarks, hybrid search retrieves the correct answer chunk in the top 5 results 91% of the time. Semantic-only achieves 76%. Keyword-only achieves 68%. The combination outperforms either approach alone, especially for questions that mix conceptual queries with specific named entities — which is most real enterprise questions.

Search isn't a solved problem. But running two complementary approaches in parallel and merging intelligently gets us close enough that the answer quality is limited by the documents, not by the retrieval.

✱

InternalWiki Team

Building the enterprise answer layer.

Get new posts delivered

Product updates, engineering deep dives, and what we're learning about enterprise AI. No spam. Unsubscribe anytime.

Join 200+ subscribers