Your wiki search is broken. RAG done right fixes it.
Unified retrieval across Confluence, SharePoint, Notion, Slack, and Google Drive — with real ACL enforcement, mandatory citations, and minute-level freshness. Built for teams where wrong answers have consequences.
Keyword search lost the war the moment your wiki crossed 10,000 pages.
Your team has Confluence, SharePoint, Notion, a half-migrated Google Drive, three years of Slack history, and a Zendesk knowledge base. Search across them ranges from bad to nonexistent. Engineers ask the same questions in #ask-platform every week because finding the answer is slower than re-deriving it. Meanwhile someone bolted ChatGPT onto a vector database, called it AI search, and now it confidently cites pages that were deprecated in 2022 — or worse, surfaces an HR doc to an intern. Both failure modes erode trust until people stop using it.
- ▸ Native search in each tool only sees its own silo, so cross-system questions get zero useful hits.
- ▸ Naive RAG implementations ignore source ACLs and surface restricted SharePoint or Slack content to the wrong users.
- ▸ Stale embeddings answer with deprecated runbooks, outdated ADRs, and policies that were rewritten months ago.
- ▸ No citations means no trust — engineers verify every answer manually, which is slower than just searching themselves.
Build search that respects ACLs, cites sources, and stays fresh.
- STEP-01
Inventory and rank your sources
Before any embeddings, we audit every system in scope — Confluence spaces, SharePoint sites, Notion workspaces, Slack channels, Google Drive, Zendesk macros — and rank by signal density. Most of the value comes from 20% of the corpus. The rest is noise that hurts retrieval.
- STEP-02
Mirror permissions into the index
We pull ACLs from each source (Microsoft Graph, Confluence REST, Notion API, Slack scopes) and store them as metadata on every chunk. At query time we filter by the user's group memberships before retrieval, not after. No leakage, no apologetic disclaimers.
- STEP-03
Chunk, embed, and re-rank
Semantic chunking by heading structure, not fixed token windows. Embeddings via text-embedding-3-large or Voyage. Hybrid retrieval (BM25 + vector) into a Cohere or cross-encoder re-ranker. This combination consistently beats pure vector search on internal docs by a wide margin.
- STEP-04
Force citations, refuse to guess
The LLM prompt requires inline citations to source chunks with URLs and last-modified dates. If retrieval returns nothing above a confidence threshold, the system says so instead of hallucinating. Users learn to trust it because it admits ignorance.
- STEP-05
Incremental sync and staleness alerts
Webhooks from each source trigger re-indexing within minutes of edits. We track per-document freshness and surface 'last updated 14 months ago' warnings in the UI. Stale answers are worse than no answer when someone is debugging a production incident.
from typing import List
from dataclasses import dataclass
@dataclass
class Chunk:
id: str
text: str
source_url: str
last_modified: str
acl_groups: List[str] # e.g. ['eng-platform', 'sec-cleared']
def retrieve(query: str, user_groups: List[str], k: int = 8) -> List[Chunk]:
# 1. ACL filter pushed into the vector store query, not applied after
acl_filter = {"acl_groups": {"$in": user_groups}}
# 2. Hybrid: BM25 + dense, fused with RRF
bm25_hits = opensearch.search(query, filter=acl_filter, size=40)
vec_hits = pinecone.query(embed(query), filter=acl_filter, top_k=40)
fused = reciprocal_rank_fusion(bm25_hits, vec_hits)
# 3. Cross-encoder rerank for precision
reranked = cohere.rerank(query=query, documents=fused, top_n=k)
# 4. Drop anything below confidence floor — better to return nothing
return [c for c in reranked if c.score >= 0.35]
def answer(query: str, user_groups: List[str]) -> dict:
chunks = retrieve(query, user_groups)
if not chunks:
return {"answer": "No internal docs matched. Try rephrasing or check #ask-it.", "citations": []}
return llm.generate(
system="Cite every claim as [n]. If sources disagree, say so. Never invent.",
context=chunks,
query=query,
) ACL filtering happens inside the vector store query — never as a post-hoc filter on results — so a user can't infer the existence of documents they can't read.
Field FAQ.
→ How is this different from the AI search built into Confluence, Notion, or SharePoint Copilot?
Native AI search only sees its own silo. Your engineers don't live in one tool — they live in Confluence runbooks, Slack incident threads, SharePoint contracts, and Notion product specs simultaneously. A unified retrieval layer answers cross-system questions like 'why did we pick Postgres over DynamoDB' that no single vendor's AI can. We also give you control over the model, the prompt, and the citation format, which the vendor tools don't.
→ How do you handle permissions so the LLM doesn't leak documents?
Every chunk in the index carries the ACL metadata from its source — Microsoft Graph permissions for SharePoint, space and page restrictions for Confluence, channel membership for Slack, and so on. The user's group memberships are resolved at query time and applied as a filter inside the vector store query itself, before retrieval. The LLM never sees a chunk the user couldn't open directly. We also log every retrieval for audit.
→ What does 'fresh' actually mean? How fast does new content show up?
For systems with webhooks (Slack, Notion, GitHub, Zendesk) we typically index changes within 60-120 seconds. For systems without good webhooks (some SharePoint configurations, older Confluence) we run a delta crawl every 5-15 minutes using last-modified timestamps. We also expose a manual 'reindex this page' button for editors who just published something critical and don't want to wait.
→ Why citations? Can't the LLM just answer the question?
Without citations, users either trust the LLM blindly or distrust it entirely — both are bad. Citations let the reader verify, which is the only sustainable trust model for an internal tool. They also create a feedback loop: when a citation points to a stale or wrong doc, the user fixes the doc. The search system becomes a forcing function for documentation hygiene, which is often a bigger win than the search itself.
→ Which embedding model and vector store do you recommend?
For most internal search workloads we default to OpenAI text-embedding-3-large or Voyage voyage-3 for embeddings, paired with pgvector for teams under ~5M chunks or Pinecone/Qdrant above that. For re-ranking, Cohere Rerank 3 or a fine-tuned cross-encoder. We don't religiously stick to one stack — we benchmark on a sample of your actual queries before committing, because internal jargon changes which models win.
→ Can this run fully on-prem or in a GovCloud environment?
Yes. As an SDVOSB we regularly deploy in AWS GovCloud and Azure Government. The retrieval and re-ranking stack runs entirely in your VPC. For the LLM itself, options include Azure OpenAI in GovCloud, Bedrock with Claude in commercial-to-Gov configurations, or self-hosted Llama 3.1 70B / Qwen on GPU instances when no commercial API is acceptable. We've shipped all three patterns.
→ How do you measure whether the search is actually good?
We build an eval set of 100-300 real questions from your team — pulled from Slack 'does anyone know' messages and help desk tickets — with expert-validated correct answers and source documents. Every prompt change, model swap, or chunking tweak is scored against that set for retrieval recall@k and answer faithfulness. Vibes-based evaluation is how teams ship search that feels good in demos and fails in production.
→ What does a typical engagement timeline look like?
A first usable system covering two or three sources is typically 3-5 weeks: week 1 source audit and ACL mapping, weeks 2-3 ingestion and retrieval, week 4 UI and citation formatting, week 5 eval set and tuning. Adding additional sources after that is usually 3-7 days each. Full production rollout with SSO, audit logging, and admin tooling generally lands inside 10 weeks.
→ What happens when source documents contradict each other?
This is the rule, not the exception — your 2019 architecture doc disagrees with your 2024 ADR. We prompt the model to surface the disagreement explicitly, weight by recency in the re-ranker, and show last-modified dates in citations so the reader can judge. We also build a lightweight 'mark as outdated' flow so subject matter experts can deprecate stale pages without deleting them.
Continue recon.
AI Integration Services
How we embed Claude, GPT, and RAG into existing enterprise workflows.
REL-02Case Studies
Internal search and RAG deployments we've shipped, with measured outcomes.
REL-03Fixed-Scope Packages
Defined RAG and internal-search engagements with clear timelines and deliverables.
REL-04Scope an Engagement
Talk to an engineer who has shipped this integration before, not a sales rep.
Stop losing 30 minutes per question. Let's scope your internal search rollout.
Talk to a VooStack operator. We respond within one business day.