[ AI INTEGRATION ] // HEALTHCARE

HIPAA-compliant LLMs, wired into your EHR — without becoming a medical device.

We deploy Claude, GPT-4, and self-hosted Llama into Epic, Cerner, and athenahealth workflows under signed BAAs, with the audit trail your CISO and your malpractice carrier both need.

Veteran-Owned SDVOSB
[001 / 005] Field Conditions

Most healthcare AI pilots stall at the security review or break on the chart write.

// SITUATION

The pattern is familiar. A vendor demo wows the CMIO. Procurement signs. Six months later the project is stuck because nobody mapped how PHI flows through the model, the BAA does not actually cover the inference endpoint, the Epic integration was assumed to be a weekend of FHIR work and it was not, and legal got nervous about CDS liability after reading the FDA's 2022 guidance. The pilot becomes a Slack channel of unresolved blockers. The model never writes to the chart. Clinicians lose interest. Budget evaporates.

  • Vendor sends PHI to an OpenAI endpoint that has no BAA, discovered three weeks before go-live by the CISO.
  • Epic App Orchard submission was not started; in-chart embedding is now 9 months away from a 4-month project.
  • No human-in-the-loop UX, so the output is technically a CDS recommendation and now legal wants an FDA opinion.
  • Hallucinations were never measured against a clinician-graded test set, so nobody can defend the model in a M&M conference.
100%
Inference under signed BAA or on-prem
< 6 wks
From kickoff to first clinician-tested pilot
SDVOSB
Set-aside eligible for VA, DHA, IHS, FQHC
[002 / 005] Operational Approach

Ship AI inside the compliance envelope, not around it.

  1. STEP-01

    Map PHI flow before any model call

    Before a single token leaves a server we diagram every PHI boundary: Epic/Cerner FHIR endpoints, HL7 v2 feeds, S3 buckets holding DICOM, the Twilio number that takes patient intake. We tag each hop, identify what BAA covers it, and decide where de-identification happens. No model integration starts until that diagram is signed off by your privacy officer.

  2. STEP-02

    Choose inference posture deliberately

    Cloud (Azure OpenAI under BAA, Bedrock with Claude), VPC-isolated (Anthropic via AWS PrivateLink), or on-prem (Llama 3.1 70B on H100s, or a smaller fine-tune on A10s). We pick based on data sensitivity, latency, throughput, and whether you have an MLOps team. We do not default to cloud because it is convenient.

  3. STEP-03

    Wrap clinical outputs in a liability layer

    Any model output that touches a clinician gets logged with prompt, response, model version, and retrieval sources. We build the human-in-the-loop UI so the clinician acknowledges or edits before anything hits the chart. This is what keeps you on the right side of the FDA SaMD line and your malpractice carrier.

  4. STEP-04

    Integrate via real EHR contracts, not screen-scraping

    We build against FHIR R4 with SMART on FHIR launch, use HL7 v2 where the EHR still demands it, and write to the chart through sanctioned APIs only. Epic App Orchard and Cerner Code submissions take months — we plan around that timeline rather than pretending it does not exist.

  5. STEP-05

    Validate with clinicians, then pilot small

    Pre-launch we run shadow mode on 30 to 90 days of real (de-identified) cases and have clinicians grade outputs against ground truth. Pilot with one specialty or one clinic. Track precision, recall, time saved, and override rate weekly. Expand only after the override rate stabilizes.

// PYTHON PATTERN
# HIPAA-aware RAG call against Azure OpenAI under BAA.
# Every clinical output is logged with model version + retrieval IDs
# so the chart entry is reconstructible during audit or litigation.

from openai import AzureOpenAI
from audit import log_clinical_inference
from phi import scrub_identifiers

client = AzureOpenAI(
    azure_endpoint=AZURE_BAA_ENDPOINT,  # covered under signed BAA
    api_version="2024-06-01",
)

def summarize_encounter(note: str, patient_id: str, user_id: str):
    # 1. De-identify before retrieval. Re-identify only on render.
    scrubbed, token_map = scrub_identifiers(note)

    # 2. Pull guideline snippets from internal vector store (not public web).
    refs = guideline_index.search(scrubbed, k=4, specialty="cardiology")

    resp = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        temperature=0.1,
        messages=[
            {"role": "system", "content": CLINICAL_SUMMARY_PROMPT},
            {"role": "user", "content": f"Note:\n{scrubbed}\n\nRefs:\n{refs}"},
        ],
    )

    # 3. Audit row is non-negotiable. No log, no write to EHR.
    log_clinical_inference(
        patient_id=patient_id, user_id=user_id,
        model="gpt-4o-2024-08-06", prompt_hash=hash(scrubbed),
        retrieval_ids=[r.id for r in refs],
        response=resp.choices[0].message.content,
    )
    return resp.choices[0].message.content  # clinician must accept before chart write

Three things every healthcare LLM call needs: PHI scrubbing before retrieval, retrieval scoped to vetted internal sources, and an immutable audit row tied to the clinician who acted on it.

[003 / 005] Common Questions

Field FAQ.

Is it actually safe to send PHI to GPT-4 or Claude?

Yes, if you do it under a signed BAA. Azure OpenAI Service and AWS Bedrock (which hosts Claude) both offer BAAs covering PHI. OpenAI direct does not. The bigger question is whether you should — for some workloads (ambient scribing, prior auth letters) the cloud BAA path is fine. For others (psychiatric notes, pediatric oncology) we recommend VPC isolation or on-prem inference even when cloud is technically permitted.

Where does the FDA software-as-medical-device line actually fall?

If your AI provides a specific diagnosis, treatment recommendation, or triage decision that a clinician cannot independently review, you are likely a SaMD and need 510(k) or De Novo clearance. If the output is a draft the clinician edits — an encounter summary, a letter, a coding suggestion — you generally are not. The line is fuzzy and CDS-specific guidance from 2022 matters. We scope projects on the safe side and bring in regulatory counsel before anything ambiguous ships.

Can you integrate with Epic or Cerner?

Yes. We build against FHIR R4 with SMART on FHIR for launch context and OAuth, use HL7 v2 interfaces where the site still runs them, and submit to Epic's App Orchard / Showroom or Cerner Code when in-chart embedding is required. Plan for 3 to 9 months of EHR vendor review on top of build time. Read-only integrations move faster than write-back; we usually ship read-only first to prove value.

On-prem inference vs. cloud — when does on-prem actually make sense?

On-prem makes sense when (1) data residency or contractual restrictions forbid cloud even under BAA, (2) you have steady high-volume workloads where GPU economics beat per-token pricing, or (3) latency requirements are sub-200ms at the bedside. Llama 3.1 70B on a pair of H100s handles a mid-size hospital's ambient scribing load. Below that volume, cloud under BAA is almost always cheaper and faster to ship.

How do you handle hallucinations in a clinical context?

Three layers. First, retrieval-grounded prompting against vetted internal sources (your formulary, your guidelines, UpToDate if licensed) — never the open web. Second, structured output with citations the clinician can click through. Third, human-in-the-loop UX that requires explicit acceptance before anything writes to the chart. We also run weekly evaluation against a clinician-graded test set and alert on drift. Hallucinations do not get to zero; the workflow has to assume they exist.

Does SDVOSB status matter for healthcare work?

It matters if you are a VA medical center, a DHA facility, an IHS site, or a federally qualified health center pursuing federal contracts. SDVOSB set-aside and sole-source authority can move a procurement from 12 months to 8 weeks. For private health systems the status is irrelevant to the technical work but some systems prefer veteran-owned vendors as a values alignment. We do both commercial and federal healthcare engagements.

What does a typical first engagement look like?

Two to four weeks. We map PHI flow, audit your existing EHR integrations, identify two or three high-leverage AI use cases (usually ambient documentation, prior auth, or inbox triage), and produce a build plan with a compliance posture, an inference architecture, a pilot scope, and a budget range. Deliverable is a document your CMIO, CISO, and CFO can all sign off on. Implementation follows if the numbers work.

How do you price AI integration projects?

Discovery is fixed-fee, usually $25k to $60k depending on scope. Build is either fixed-fee per milestone or T&M with a not-to-exceed, depending on how well-specified the EHR integration is. A first production pilot typically lands between $150k and $500k including the compliance work. Ongoing inference costs are separate and depend on volume — we model those during discovery so there are no surprises.

Can you augment our existing engineering team instead of building turnkey?

Yes. Staff augmentation with senior engineers who have shipped HIPAA-compliant ML systems is one of our core offerings. We embed one to four engineers into your team, work in your repo, your sprint cadence, your on-call rotation. Typical engagements run 6 to 18 months. This is often the right model when you have a capable platform team but lack specific experience with LLM deployment, FHIR integration, or federal compliance.

[ NEXT ACTION ]

Bring us your EHR diagram and your compliance officer. We'll tell you what's actually shippable.

Talk to a VooStack operator. We respond within one business day.