Document age is not the same as staleness
By InternalWiki Team · 10 March 2026 · 6 min read
Most search tools rank recent documents higher and penalise old ones. The assumption is intuitive: newer means more current. In consumer search, this is usually true. A blog post from 2019 about JavaScript frameworks is probably outdated.
In enterprise, this assumption breaks down completely.
When old documents are the right answer
A five-year-old employment contract is still valid — it hasn't been superseded, and its terms still apply. A lease agreement signed in 2022 governs the office space until 2027. The company's articles of incorporation haven't changed since founding. These documents are old by any time-based metric, but they are the most current version of the truth they describe.
Meanwhile, yesterday's Slack message saying "the meeting is at 3pm" might already be wrong — someone may have moved it to 4pm in a later thread. A project tracker updated this morning reflects the state of the project right now, but might be different by tomorrow.
Age tells you when something was written. Freshness tells you whether it's still true. These are different questions.
Five document types
InternalWiki classifies every document into one of five freshness types:
Evergreen — contracts, policies, legal agreements. Valid until explicitly replaced. Your employment contract says you get 25 days of leave per year. It was signed in 2021. Nothing has changed — it's still the governing document. A search engine that deprioritises it because it's “old” would bury the one document that determines your actual entitlement.
Periodically updated — org charts, team directories, handbooks. These have a natural refresh cycle. The org chart shows James Ward reporting to Sarah Chen. That was accurate when it was updated four months ago. But James moved to a different team last week. The org chart hasn't caught up yet. This document needs a “may be outdated” warning — not because it's old, but because its type has a natural refresh cycle that has elapsed.
Point-in-time — meeting notes, decision records, announcements. The Q3 board minutes record a decision to proceed with the office move. Those minutes are six months old. They will never be “outdated” — they're a historical record of what was decided at a specific point in time. Marking them stale would be incorrect; they are permanently historical.
Operationally live — project trackers, sprint boards, status pages. The sprint board says the API migration is 60% complete. That was yesterday. Today it might be 65%. Or it might be blocked. This document's value decays hourly, and any answer drawn from it should carry a clear recency warning.
Regulatory — compliance policies, data retention rules. The GDPR compliance policy was written 18 months ago. But GDPR hasn't changed. The policy is current until the regulation is amended — not until the document passes some arbitrary age threshold.
// Freshness classification in the retrieval pipeline
function classifyFreshness(doc: Document): FreshnessStatus {
switch (doc.contentType) {
case "contract":
case "policy":
return doc.supersededBy ? "superseded" : "still-valid";
case "org-chart":
case "directory":
return daysSince(doc.lastModified) > 90
? "may-be-outdated"
: "current";
case "meeting-notes":
return "historical";
case "project-tracker":
return daysSince(doc.lastModified) > 1
? "may-be-outdated"
: "current";
default:
return "unknown";
}
}What goes wrong with time-based freshness
A search engine that penalises old documents will bury your employment contracts, your lease agreements, your compliance policies — exactly the documents that matter most in enterprise. It will surface yesterday's Slack messages and meeting notes at the top because they're “fresh.” For most enterprise questions, the oldest documents are the most important ones.
Time-based ranking optimises for recency. Enterprise queries optimise for accuracy. These are in direct conflict. A junior analyst searching for the company's IP ownership policy should not get a Slack thread from last week above the actual policy document from 2019.
How we classify
InternalWiki determines document type from three signals: source metadata (file type, storage location, naming patterns — a PDF in a “Contracts” folder signals evergreen), content analysis (legal language patterns, table structures, meeting headers — the phrase “WHEREAS” is a reliable contract indicator), and source type (Slack messages are inherently volatile; Google Drive PDFs in a “Policies” folder are likely periodically updated).
The classification is automatic but reviewable. Admins can see how each document was classified and override it when the system gets it wrong. A well-labelled document library will see near-perfect automatic classification; a disorganised one may need occasional correction.
The right question
The right question isn't "how old is this document?" It's "what would make this document invalid, and has that happened?" A contract becomes invalid when it's terminated or superseded. An org chart becomes invalid when someone joins or leaves. Meeting notes never become invalid — they're a record of what happened.
Time-based freshness is a proxy. Content-type freshness is the real signal. InternalWiki uses the latter because getting this wrong has consequences — and because once you see it done right, the time-based approach feels obviously broken.
InternalWiki Team
Building the enterprise answer layer.
More from the blog
Get new posts delivered
Product updates, engineering deep dives, and what we're learning about enterprise AI. No spam. Unsubscribe anytime.
Join 200+ subscribers