What happens when you ask InternalWiki a question
By InternalWiki Team · 6 March 2026 · 7 min read
A user types: "When does the Berkeley Square lease expire?" They get a cited answer in 0.8 seconds. Here's what happens in that time.
Stage 1: Query understanding (0–50ms)
The question is analysed for intent and key entities. "Berkeley Square" is identified as a proper noun — likely a location or property name. "Lease" signals a legal/contractual document type. "Expires" indicates the user wants a specific date. This informs both the semantic and keyword search strategies.
Stage 2: Hybrid retrieval (50–230ms)
Two searches run in parallel. The semantic search converts the question into an embedding vector and finds document chunks with similar meaning. The keyword search looks for exact terms — "Berkeley Square" and "lease" — that semantic search might miss.
Both searches include a permission predicate. The database only returns chunks from documents this specific user has access to. If the lease agreement is in a folder the user can't see, it never appears in results.
In practice, the semantic search examines 847 document chunks across 312 documents in 0.18 seconds. It returns the top 20 by cosine similarity. In parallel, the keyword search scans the full-text index for “Berkeley Square” and “lease” — 0.12 seconds, 8 matches. The merge step deduplicates and re-ranks using Reciprocal Rank Fusion, producing a final set of 10 chunks.
-- Hybrid retrieval: semantic + keyword, permission-filtered
WITH semantic AS (
SELECT chunk_id, 1 - (embedding <=> $query_vector) AS score
FROM chunks
JOIN permissions ON chunks.doc_id = permissions.doc_id
WHERE permissions.user_id = $uid
ORDER BY embedding <=> $query_vector
LIMIT 20
),
keyword AS (
SELECT chunk_id, ts_rank(tsv, query) AS score
FROM chunks, plainto_tsquery($query_text) query
JOIN permissions ON chunks.doc_id = permissions.doc_id
WHERE permissions.user_id = $uid AND tsv @@ query
LIMIT 20
)
SELECT chunk_id, MAX(score) AS combined
FROM (SELECT * FROM semantic UNION ALL SELECT * FROM keyword)
GROUP BY chunk_id
ORDER BY combined DESC
LIMIT 10;Why hybrid search matters here
Pure semantic search would find documents about “office leases” and “property agreements” — good for capturing intent. But “Berkeley Square” is a proper noun. Semantic embeddings dilute named entities; they encode meaning at the level of concepts, not specific names. The keyword search catches it exactly.
Without hybrid search, the answer might reference the Manchester Square lease instead — semantically similar, since both are London commercial leases, but factually wrong. The hybrid approach is what makes the difference between a confident wrong answer and the correct one.
Stage 3: Context assembly (230–245ms)
The top 10 chunks are assembled into a context window. Each chunk carries metadata: the source document title, the document type (for freshness classification), the last modified date, and the similarity score. Chunks are ordered by relevance, and the total token count is checked against the model's context limit.
Stage 4: Generation with citation anchoring (245–765ms)
The context and question go to the language model with a specific instruction: every factual claim in the answer must be anchored to a specific chunk. The model generates: "The Berkeley Square lease expires on 30 September 2027 [1]. The lease was renewed in 2022 for a five-year term [1] with an option to extend for an additional three years, subject to board approval [2]."
Each citation number maps to a specific chunk. The model cannot make claims that aren't grounded in the retrieved context.
Stage 5: Trust Panel assembly (765–850ms)
The confidence score is calculated based on source count, relevance scores, and cross-source agreement. Each cited source gets a freshness classification — the lease agreement is an evergreen document, so despite being signed in 2022, it shows "Still valid." The Trust Panel is assembled: confidence bar, source cards with freshness badges, citation links back to the original documents.
How confidence is calculated
The confidence score is not a single number from the language model. It is computed from four distinct signals. First: source count — more independent sources supporting a claim increases confidence, since corroboration from multiple documents is harder to fake. Second: relevance scores — how closely the retrieved chunks match the question semantically and lexically. Third: cross-source agreement — do sources agree or contradict each other? Contradictions lower confidence and trigger a conflict warning in the Trust Panel, surfacing both positions so the user can investigate. Fourth: freshness — if supporting sources are classified as outdated, confidence drops even if all other signals are strong.
Stage 1: Query understanding 12ms
Stage 2: Hybrid retrieval 180ms
├─ Semantic search 180ms (parallel)
├─ Keyword search 120ms (parallel)
└─ Merge + re-rank 15ms
Stage 3: Permission check 15ms
Stage 4: Context assembly 8ms
Stage 5: LLM generation 520ms
Stage 6: Trust Panel assembly 85ms
─────────────────────────────────────
Total 820msTotal time: 0.8 seconds. The user sees the answer, the proof, and the context to decide whether to trust it. No black boxes.
InternalWiki Team
Building the enterprise answer layer.
More from the blog
Get new posts delivered
Product updates, engineering deep dives, and what we're learning about enterprise AI. No spam. Unsubscribe anytime.
Join 200+ subscribers