← Back to blog
Engineering

Permission-aware retrieval: why output filtering isn't enough

By InternalWiki Team · 14 March 2026 · 7 min read

When enterprise AI tools promise "permission-aware answers," most buyers don't think to ask: where in the pipeline are permissions enforced? The answer determines whether the system is genuinely secure or merely appears to be.

There are two architecturally distinct approaches. One is correct. The other is a liability.

Output filtering: the common approach

Most AI knowledge tools use output filtering. The system retrieves all potentially relevant documents regardless of the user's permissions, feeds them into the language model, generates an answer, and then filters the output to remove references to documents the user shouldn't see.

This seems pragmatic. Retrieve broadly, answer well, filter carefully. The problem is that language models don't work like databases. You can't reliably remove the influence of a document from a generated response simply by removing the citation.

Consider a concrete example. A company has a public FAQ saying "Our standard discount is 10%" and a confidential sales playbook saying "For enterprise deals over $500K, we can offer up to 35%." An employee without sales playbook access asks: "What discounts do we offer?"

With output filtering, both documents enter the context window. The model synthesises both. The filter removes the playbook citation. But the answer might still say: "Our standard discount is 10%, though larger deals may receive more favourable terms." The restricted information has leaked — not as a direct quote, but as an inference.

A concrete attack scenario

The synthesis problem goes deeper than that example suggests. Consider a more targeted scenario: a user without access to compensation data asks about salary benchmarks. The AI generates an answer drawing on restricted salary band documents in its context window. The output filter scans for keywords like “salary,” “compensation,” and “confidential.”

But the AI has synthesised the restricted data into a sentence that contains none of those words: “Engineering managers at the L6 level typically receive compensation in the upper quartile of market rates, which our internal analysis confirms.” No flagged keywords. No direct quote. The restricted salary band data has leaked as an inference that no filter caught.

Output filtering catches explicit quotes. It doesn't catch synthesis. And language models are very good at synthesis.

Retrieval-time enforcement: the right approach

InternalWiki enforces permissions before any documents enter the context window. The vector search query includes a permission predicate — a WHERE clause that limits results to documents the user has access to:

SELECT chunks.content, chunks.metadata
FROM chunks
JOIN permissions ON chunks.document_id = permissions.document_id
WHERE permissions.user_id = $current_user
  AND permissions.access_level >= 'read'
  AND chunks.embedding <=> $query_embedding < 0.7
ORDER BY chunks.embedding <=> $query_embedding
LIMIT 10;

The language model never sees content the user isn't authorised to view. There is no post-processing step. There is no filter that might have edge cases. The boundary is enforced at the data layer.

How the JOIN works

The permissions table mirrors the access control list from each connected source. Google Drive folder sharing creates a row per user per document. Slack channel membership creates a row per member per channel. SharePoint site permissions create a row per user per site, with item-level overrides tracked separately.

The JOIN is an INNER JOIN. If no matching permission row exists for the requesting user and a given document chunk, that chunk is invisible — not filtered after retrieval, but never returned by the query. The language model has zero opportunity to see it. There is no subsequent step that could accidentally include it.

ACL inheritance in practice

Each source platform has its own permission model, and InternalWiki respects the full inheritance chain for each.

Google Drive: if you share a folder with the finance team, every file inside inherits that access. If you then restrict a specific file to just the CFO, that file-level permission overrides the folder. InternalWiki tracks both the inherited permission and the override, and applies the more restrictive rule.

Slack: public channels are readable by everyone in the workspace. Private channels are restricted to members. Direct messages are never indexed — not restricted, not filtered, simply never ingested. The indexing decision is made at source type level, before any content is processed.

Microsoft: SharePoint site membership is the baseline. Individual documents can have unique permissions that differ from their parent library or site. InternalWiki respects item-level permissions and re-checks them on every sync cycle, since SharePoint administrators frequently grant one-off access that diverges from the site defaults.

Why this architecture is harder to build

Retrieval-time enforcement requires maintaining a real-time permissions index that stays in sync with three different source platforms, each with different permission models, different webhook APIs, and different edge cases.

Every time someone changes sharing settings in Google Drive, adds a member to a Slack channel, or modifies SharePoint permissions, that change needs to propagate to the InternalWiki permissions table before the next query arrives. This requires webhook listeners for real-time updates, periodic full syncs as a safety net for missed events, and careful handling of each platform's edge cases — inherited permissions, group memberships, org-wide sharing policies.

This infrastructure represents roughly 40% of our engineering effort. It's the least visible part of the product — users never see the permissions sync running. But security boundaries aren't optional for enterprise. A system that is “mostly secure” is not secure.

Why this matters

When evaluating AI knowledge tools, security teams should ask one question: does the language model ever see documents the requesting user doesn't have access to? If yes, the system has a structural information leakage risk. If no, the security boundary is architecturally sound.

We made this decision on day one. Permissions are enforced at retrieval time, in the database query, before the language model reads a single word. It's harder to build. It's the only way to build it right.

Get new posts delivered

Product updates, engineering deep dives, and what we're learning about enterprise AI. No spam. Unsubscribe anytime.

Join 200+ subscribers