[ AI INTEGRATION ] // INTERNAL SEARCH

Your wiki search is broken. RAG done right fixes it.

Q: How is this different from the AI search built into Confluence, Notion, or SharePoint Copilot?

Native AI search only sees its own silo. Your engineers don't live in one tool — they live in Confluence runbooks, Slack incident threads, SharePoint contracts, and Notion product specs simultaneously. A unified retrieval layer answers cross-system questions like 'why did we pick Postgres over DynamoDB' that no single vendor's AI can. We also give you control over the model, the prompt, and the citation format, which the vendor tools don't.

Q: How do you handle permissions so the LLM doesn't leak documents?

Every chunk in the index carries the ACL metadata from its source — Microsoft Graph permissions for SharePoint, space and page restrictions for Confluence, channel membership for Slack, and so on. The user's group memberships are resolved at query time and applied as a filter inside the vector store query itself, before retrieval. The LLM never sees a chunk the user couldn't open directly. We also log every retrieval for audit.

Q: What does 'fresh' actually mean? How fast does new content show up?

For systems with webhooks (Slack, Notion, GitHub, Zendesk) we typically index changes within 60-120 seconds. For systems without good webhooks (some SharePoint configurations, older Confluence) we run a delta crawl every 5-15 minutes using last-modified timestamps. We also expose a manual 'reindex this page' button for editors who just published something critical and don't want to wait.

Q: Why citations? Can't the LLM just answer the question?

Without citations, users either trust the LLM blindly or distrust it entirely — both are bad. Citations let the reader verify, which is the only sustainable trust model for an internal tool. They also create a feedback loop: when a citation points to a stale or wrong doc, the user fixes the doc. The search system becomes a forcing function for documentation hygiene, which is often a bigger win than the search itself.

Q: Which embedding model and vector store do you recommend?

For most internal search workloads we default to OpenAI text-embedding-3-large or Voyage voyage-3 for embeddings, paired with pgvector for teams under ~5M chunks or Pinecone/Qdrant above that. For re-ranking, Cohere Rerank 3 or a fine-tuned cross-encoder. We don't religiously stick to one stack — we benchmark on a sample of your actual queries before committing, because internal jargon changes which models win.

Q: Can this run fully on-prem or in a GovCloud environment?

Yes. As an SDVOSB we regularly deploy in AWS GovCloud and Azure Government. The retrieval and re-ranking stack runs entirely in your VPC. For the LLM itself, options include Azure OpenAI in GovCloud, Bedrock with Claude in commercial-to-Gov configurations, or self-hosted Llama 3.1 70B / Qwen on GPU instances when no commercial API is acceptable. We've shipped all three patterns.

Q: How do you measure whether the search is actually good?

We build an eval set of 100-300 real questions from your team — pulled from Slack 'does anyone know' messages and help desk tickets — with expert-validated correct answers and source documents. Every prompt change, model swap, or chunking tweak is scored against that set for retrieval recall@k and answer faithfulness. Vibes-based evaluation is how teams ship search that feels good in demos and fails in production.

Q: What does a typical engagement timeline look like?

A first usable system covering two or three sources is typically 3-5 weeks: week 1 source audit and ACL mapping, weeks 2-3 ingestion and retrieval, week 4 UI and citation formatting, week 5 eval set and tuning. Adding additional sources after that is usually 3-7 days each. Full production rollout with SSO, audit logging, and admin tooling generally lands inside 10 weeks.

Q: What happens when source documents contradict each other?

This is the rule, not the exception — your 2019 architecture doc disagrees with your 2024 ADR. We prompt the model to surface the disagreement explicitly, weight by recency in the re-ranker, and show last-modified dates in citations so the reader can judge. We also build a lightweight 'mark as outdated' flow so subject matter experts can deprecate stale pages without deleting them.

Unified retrieval across Confluence, SharePoint, Notion, Slack, and Google Drive — with real ACL enforcement, mandatory citations, and minute-level freshness. Built for teams where wrong answers have consequences.

Veteran-Owned SDVOSB

[001 / 005] Field Conditions

Keyword search lost the war the moment your wiki crossed 10,000 pages.

// SITUATION

Your team has Confluence, SharePoint, Notion, a half-migrated Google Drive, three years of Slack history, and a Zendesk knowledge base. Search across them ranges from bad to nonexistent. Engineers ask the same questions in #ask-platform every week because finding the answer is slower than re-deriving it. Meanwhile someone bolted ChatGPT onto a vector database, called it AI search, and now it confidently cites pages that were deprecated in 2022 — or worse, surfaces an HR doc to an intern. Both failure modes erode trust until people stop using it.

▸ Native search in each tool only sees its own silo, so cross-system questions get zero useful hits.
▸ Naive RAG implementations ignore source ACLs and surface restricted SharePoint or Slack content to the wrong users.
▸ Stale embeddings answer with deprecated runbooks, outdated ADRs, and policies that were rewritten months ago.
▸ No citations means no trust — engineers verify every answer manually, which is slower than just searching themselves.

3-5 wks

to first production-ready rollout

< 2 min

typical freshness lag from edit to indexed

100%

of answers carry inline source citations

[002 / 005] Operational Approach

Build search that respects ACLs, cites sources, and stays fresh.

STEP-01
Inventory and rank your sources

Before any embeddings, we audit every system in scope — Confluence spaces, SharePoint sites, Notion workspaces, Slack channels, Google Drive, Zendesk macros — and rank by signal density. Most of the value comes from 20% of the corpus. The rest is noise that hurts retrieval.
STEP-02
Mirror permissions into the index

We pull ACLs from each source (Microsoft Graph, Confluence REST, Notion API, Slack scopes) and store them as metadata on every chunk. At query time we filter by the user's group memberships before retrieval, not after. No leakage, no apologetic disclaimers.
STEP-03
Chunk, embed, and re-rank

Semantic chunking by heading structure, not fixed token windows. Embeddings via text-embedding-3-large or Voyage. Hybrid retrieval (BM25 + vector) into a Cohere or cross-encoder re-ranker. This combination consistently beats pure vector search on internal docs by a wide margin.
STEP-04
Force citations, refuse to guess

The LLM prompt requires inline citations to source chunks with URLs and last-modified dates. If retrieval returns nothing above a confidence threshold, the system says so instead of hallucinating. Users learn to trust it because it admits ignorance.
STEP-05
Incremental sync and staleness alerts

Webhooks from each source trigger re-indexing within minutes of edits. We track per-document freshness and surface 'last updated 14 months ago' warnings in the UI. Stale answers are worse than no answer when someone is debugging a production incident.

// PYTHON PATTERN

from typing import List
from dataclasses import dataclass

@dataclass
class Chunk:
    id: str
    text: str
    source_url: str
    last_modified: str
    acl_groups: List[str]  # e.g. ['eng-platform', 'sec-cleared']

def retrieve(query: str, user_groups: List[str], k: int = 8) -> List[Chunk]:
    # 1. ACL filter pushed into the vector store query, not applied after
    acl_filter = {"acl_groups": {"$in": user_groups}}

    # 2. Hybrid: BM25 + dense, fused with RRF
    bm25_hits = opensearch.search(query, filter=acl_filter, size=40)
    vec_hits = pinecone.query(embed(query), filter=acl_filter, top_k=40)
    fused = reciprocal_rank_fusion(bm25_hits, vec_hits)

    # 3. Cross-encoder rerank for precision
    reranked = cohere.rerank(query=query, documents=fused, top_n=k)

    # 4. Drop anything below confidence floor — better to return nothing
    return [c for c in reranked if c.score >= 0.35]

def answer(query: str, user_groups: List[str]) -> dict:
    chunks = retrieve(query, user_groups)
    if not chunks:
        return {"answer": "No internal docs matched. Try rephrasing or check #ask-it.", "citations": []}
    return llm.generate(
        system="Cite every claim as [n]. If sources disagree, say so. Never invent.",
        context=chunks,
        query=query,
    )

ACL filtering happens inside the vector store query — never as a post-hoc filter on results — so a user can't infer the existence of documents they can't read.

[003 / 005] Common Questions

Field FAQ.

→ How is this different from the AI search built into Confluence, Notion, or SharePoint Copilot?

Native AI search only sees its own silo. Your engineers don't live in one tool — they live in Confluence runbooks, Slack incident threads, SharePoint contracts, and Notion product specs simultaneously. A unified retrieval layer answers cross-system questions like 'why did we pick Postgres over DynamoDB' that no single vendor's AI can. We also give you control over the model, the prompt, and the citation format, which the vendor tools don't.

→ How do you handle permissions so the LLM doesn't leak documents?

Every chunk in the index carries the ACL metadata from its source — Microsoft Graph permissions for SharePoint, space and page restrictions for Confluence, channel membership for Slack, and so on. The user's group memberships are resolved at query time and applied as a filter inside the vector store query itself, before retrieval. The LLM never sees a chunk the user couldn't open directly. We also log every retrieval for audit.

→ What does 'fresh' actually mean? How fast does new content show up?

For systems with webhooks (Slack, Notion, GitHub, Zendesk) we typically index changes within 60-120 seconds. For systems without good webhooks (some SharePoint configurations, older Confluence) we run a delta crawl every 5-15 minutes using last-modified timestamps. We also expose a manual 'reindex this page' button for editors who just published something critical and don't want to wait.

→ Why citations? Can't the LLM just answer the question?

Without citations, users either trust the LLM blindly or distrust it entirely — both are bad. Citations let the reader verify, which is the only sustainable trust model for an internal tool. They also create a feedback loop: when a citation points to a stale or wrong doc, the user fixes the doc. The search system becomes a forcing function for documentation hygiene, which is often a bigger win than the search itself.

→ Which embedding model and vector store do you recommend?

For most internal search workloads we default to OpenAI text-embedding-3-large or Voyage voyage-3 for embeddings, paired with pgvector for teams under ~5M chunks or Pinecone/Qdrant above that. For re-ranking, Cohere Rerank 3 or a fine-tuned cross-encoder. We don't religiously stick to one stack — we benchmark on a sample of your actual queries before committing, because internal jargon changes which models win.

→ Can this run fully on-prem or in a GovCloud environment?

Yes. As an SDVOSB we regularly deploy in AWS GovCloud and Azure Government. The retrieval and re-ranking stack runs entirely in your VPC. For the LLM itself, options include Azure OpenAI in GovCloud, Bedrock with Claude in commercial-to-Gov configurations, or self-hosted Llama 3.1 70B / Qwen on GPU instances when no commercial API is acceptable. We've shipped all three patterns.

→ How do you measure whether the search is actually good?

We build an eval set of 100-300 real questions from your team — pulled from Slack 'does anyone know' messages and help desk tickets — with expert-validated correct answers and source documents. Every prompt change, model swap, or chunking tweak is scored against that set for retrieval recall@k and answer faithfulness. Vibes-based evaluation is how teams ship search that feels good in demos and fails in production.

→ What does a typical engagement timeline look like?

A first usable system covering two or three sources is typically 3-5 weeks: week 1 source audit and ACL mapping, weeks 2-3 ingestion and retrieval, week 4 UI and citation formatting, week 5 eval set and tuning. Adding additional sources after that is usually 3-7 days each. Full production rollout with SSO, audit logging, and admin tooling generally lands inside 10 weeks.

→ What happens when source documents contradict each other?

This is the rule, not the exception — your 2019 architecture doc disagrees with your 2024 ADR. We prompt the model to surface the disagreement explicitly, weight by recency in the re-ranker, and show last-modified dates in citations so the reader can judge. We also build a lightweight 'mark as outdated' flow so subject matter experts can deprecate stale pages without deleting them.

[004 / 005] Adjacent Files

Continue recon.

REL-01

Stop losing 30 minutes per question. Let's scope your internal search rollout.

Talk to a VooStack operator. We respond within one business day.

Open a Channel Full Services

Your wiki search is broken. RAG done right fixes it.

Keyword search lost the war the moment your wiki crossed 10,000 pages.

Build search that respects ACLs, cites sources, and stays fresh.

Inventory and rank your sources

Mirror permissions into the index

Chunk, embed, and re-rank

Force citations, refuse to guess

Incremental sync and staleness alerts

Field FAQ.

Continue recon.

AI Integration Services

Case Studies

Fixed-Scope Packages

Scope an Engagement

Stop losing 30 minutes per question. Let's scope your internal search rollout.