← Back to blog
Product

What happens when you ask InternalWiki a question

By InternalWiki Team · 6 March 2026 · 7 min read

A user types: "When does the Berkeley Square lease expire?" They get a cited answer in 0.8 seconds. Here's what happens in that time.

Stage 1: Query understanding (0–50ms)

The question is analysed for intent and key entities. "Berkeley Square" is identified as a proper noun — likely a location or property name. "Lease" signals a legal/contractual document type. "Expires" indicates the user wants a specific date. This informs both the semantic and keyword search strategies.

Stage 2: Hybrid retrieval (50–230ms)

Two searches run in parallel. The semantic search converts the question into an embedding vector and finds document chunks with similar meaning. The keyword search looks for exact terms — "Berkeley Square" and "lease" — that semantic search might miss.

Both searches include a permission predicate. The database only returns chunks from documents this specific user has access to. If the lease agreement is in a folder the user can't see, it never appears in results.

In practice, the semantic search examines 847 document chunks across 312 documents in 0.18 seconds. It returns the top 20 by cosine similarity. In parallel, the keyword search scans the full-text index for “Berkeley Square” and “lease” — 0.12 seconds, 8 matches. The merge step deduplicates and re-ranks using Reciprocal Rank Fusion, producing a final set of 10 chunks.

-- Hybrid retrieval: semantic + keyword, permission-filtered
WITH semantic AS (
  SELECT chunk_id, 1 - (embedding <=> $query_vector) AS score
  FROM chunks
  JOIN permissions ON chunks.doc_id = permissions.doc_id
  WHERE permissions.user_id = $uid
  ORDER BY embedding <=> $query_vector
  LIMIT 20
),
keyword AS (
  SELECT chunk_id, ts_rank(tsv, query) AS score
  FROM chunks, plainto_tsquery($query_text) query
  JOIN permissions ON chunks.doc_id = permissions.doc_id
  WHERE permissions.user_id = $uid AND tsv @@ query
  LIMIT 20
)
SELECT chunk_id, MAX(score) AS combined
FROM (SELECT * FROM semantic UNION ALL SELECT * FROM keyword)
GROUP BY chunk_id
ORDER BY combined DESC
LIMIT 10;

Why hybrid search matters here

Pure semantic search would find documents about “office leases” and “property agreements” — good for capturing intent. But “Berkeley Square” is a proper noun. Semantic embeddings dilute named entities; they encode meaning at the level of concepts, not specific names. The keyword search catches it exactly.

Without hybrid search, the answer might reference the Manchester Square lease instead — semantically similar, since both are London commercial leases, but factually wrong. The hybrid approach is what makes the difference between a confident wrong answer and the correct one.

Stage 3: Context assembly (230–245ms)

The top 10 chunks are assembled into a context window. Each chunk carries metadata: the source document title, the document type (for freshness classification), the last modified date, and the similarity score. Chunks are ordered by relevance, and the total token count is checked against the model's context limit.

Stage 4: Generation with citation anchoring (245–765ms)

The context and question go to the language model with a specific instruction: every factual claim in the answer must be anchored to a specific chunk. The model generates: "The Berkeley Square lease expires on 30 September 2027 [1]. The lease was renewed in 2022 for a five-year term [1] with an option to extend for an additional three years, subject to board approval [2]."

Each citation number maps to a specific chunk. The model cannot make claims that aren't grounded in the retrieved context.

Stage 5: Trust Panel assembly (765–850ms)

The confidence score is calculated based on source count, relevance scores, and cross-source agreement. Each cited source gets a freshness classification — the lease agreement is an evergreen document, so despite being signed in 2022, it shows "Still valid." The Trust Panel is assembled: confidence bar, source cards with freshness badges, citation links back to the original documents.

How confidence is calculated

The confidence score is not a single number from the language model. It is computed from four distinct signals. First: source count — more independent sources supporting a claim increases confidence, since corroboration from multiple documents is harder to fake. Second: relevance scores — how closely the retrieved chunks match the question semantically and lexically. Third: cross-source agreement — do sources agree or contradict each other? Contradictions lower confidence and trigger a conflict warning in the Trust Panel, surfacing both positions so the user can investigate. Fourth: freshness — if supporting sources are classified as outdated, confidence drops even if all other signals are strong.

Stage 1: Query understanding     12ms
Stage 2: Hybrid retrieval       180ms
  ├─ Semantic search            180ms (parallel)
  ├─ Keyword search             120ms (parallel)
  └─ Merge + re-rank             15ms
Stage 3: Permission check        15ms
Stage 4: Context assembly         8ms
Stage 5: LLM generation         520ms
Stage 6: Trust Panel assembly    85ms
─────────────────────────────────────
Total                           820ms

Total time: 0.8 seconds. The user sees the answer, the proof, and the context to decide whether to trust it. No black boxes.

Get new posts delivered

Product updates, engineering deep dives, and what we're learning about enterprise AI. No spam. Unsubscribe anytime.

Join 200+ subscribers