TL;DR: You can use AI for healthcare data analysis, but most popular AI tools don't meet HIPAA requirements out of the box. The key question is whether your data qualifies as PHI and whether the tool is a "business associate" under HIPAA. Local-first tools that never transmit data sidestep the problem entirely — if the data never leaves your device, there's no covered transaction.
Disclaimer: This post is informational, not legal advice. HIPAA compliance involves organizational policies, technical controls, and legal agreements. Consult your compliance officer and legal counsel for decisions about your specific data and workflows.
The problem healthcare data teams face
You're a data analyst at a health system, a health tech startup, or a payer organization. You have a dataset — claims data, patient demographics, utilization reports, clinical trial metrics. You need to answer questions about it.
Your non-healthcare peers are dragging CSVs into ChatGPT and getting answers in seconds. You can't do that. Or at least, you shouldn't. Because your data almost certainly contains Protected Health Information (PHI), and uploading PHI to a cloud AI tool without a Business Associate Agreement (BAA) is a HIPAA violation.
So you're stuck with Excel pivot tables, slow SQL queries you write yourself, or waiting for IT to build you a dashboard. The AI revolution is happening everywhere except healthcare analytics.
It doesn't have to be this way.
HIPAA basics for data teams (the 2-minute version)
What is PHI?
Protected Health Information (PHI) is any individually identifiable health information held or transmitted by a covered entity or business associate. This includes:
- Patient names, dates of birth, Social Security numbers
- Medical record numbers, health plan beneficiary numbers
- Diagnosis codes linked to identifiable individuals
- Dates of service, admission, discharge
- Any combination of demographic + health data that could identify a person
Key nuance: De-identified data is not PHI. HIPAA's Safe Harbor method requires removing 18 specific identifiers. But de-identification is hard to get right, and most operational datasets haven't been through the process.
What is a Business Associate?
A Business Associate (BA) is any entity that creates, receives, maintains, or transmits PHI on behalf of a covered entity. If you upload PHI to a cloud tool, that tool's vendor is a business associate.
What is a BAA?
A Business Associate Agreement (BAA) is a contract between a covered entity and a business associate that establishes permitted uses of PHI, safeguards, and breach notification obligations. Without a BAA, transmitting PHI to a vendor is a violation — regardless of how secure the vendor's infrastructure is.
The practical implication
To use a cloud-based AI tool with PHI, you need:
- The vendor to sign a BAA
- The vendor's infrastructure to meet HIPAA security requirements
- Any upstream AI providers (OpenAI, Anthropic, Google) to also have BAAs in place
- Appropriate access controls, audit logging, and encryption
This is a high bar. Most AI tools don't clear it.
Which AI data tools offer BAAs?
Here's the current landscape as of early 2026:
| Tool | BAA Available? | Notes |
|---|---|---|
| ChatGPT Enterprise | Yes | Via OpenAI Enterprise agreement. Does not cover ChatGPT consumer or Plus plans. |
| Google Cloud AI / Vertex AI | Yes | Via Google Cloud BAA. Requires Workspace Enterprise and specific configuration. |
| Microsoft Azure OpenAI | Yes | Via Microsoft Azure BAA. Not available through consumer-facing products. |
| AWS Bedrock | Yes | Via AWS BAA. Covers models hosted on Bedrock. |
| Julius AI | No | No BAA offered as of early 2026. |
| ChatGPT Consumer/Plus | No | Explicitly not HIPAA-eligible. |
| Google Sheets + Gemini | Depends | Google Workspace Enterprise with BAA covers Sheets. Gemini features may not be covered. |
| Jupyter (local) | N/A | No data transmission — no BA relationship. |
| Browser-local tools (WASM) | N/A* | If data never leaves the device, there may be no BA relationship to cover. |
*The "N/A" for browser-local tools is the interesting case. We'll dig into this.
The local-first argument for HIPAA
Here's where the architecture matters.
HIPAA's Business Associate rules apply when PHI is transmitted to or maintained by a third party. If your data analysis tool runs entirely on your device — in the browser via WebAssembly, for example — and the data never leaves that device, then:
- No PHI is transmitted to the tool vendor
- No PHI is maintained by the tool vendor
- The vendor may not be a Business Associate for that specific interaction
This is the same reason you don't need a BAA with Microsoft for using Excel on your laptop (you do need one for Office 365 cloud features, but not for local computation).
But what about the AI component?
This is where it gets nuanced. If the tool uses AI, you need to ask: What does the AI see?
If the AI sees PHI (full data access): The AI provider is receiving PHI. You need a BAA with both the tool vendor and the upstream AI provider. This is the ChatGPT Advanced Data Analysis model — the AI reads your data.
If the AI sees only schema (column names and types): The AI receives metadata, not PHI. Column names like patient_id, diagnosis_code, admission_date are schema information — they describe the structure of the data, not individual patient records.
However: Whether schema metadata constitutes PHI is a gray area. A column named hiv_status in a table called clinic_patients reveals that the covered entity collects HIV status data. This isn't individually identifiable, but your compliance team may still flag it.
If the AI is local (WebLLM, Ollama): No data or metadata leaves the device. This is the cleanest path from a compliance perspective.
A practical decision matrix
| AI Architecture | PHI transmitted? | BAA needed with AI provider? | HIPAA risk level |
|---|---|---|---|
| Full cloud (AI sees all data) | Yes | Yes | High — requires full compliance chain |
| Schema-only cloud AI | Metadata only | Possibly — consult legal | Low to Medium — depends on schema sensitivity |
| Local AI (WebLLM/Ollama) | No | No | Lowest — no transmission |
| No AI (manual SQL) | No | N/A | Lowest — no third party involvement |
Real-world scenarios
Scenario 1: Claims analysis for a health plan
You have a dataset of medical claims — member IDs, diagnosis codes, procedure codes, billed amounts, provider names. Your VP asks: "What's the average cost per member for diabetes-related claims?"
Option A: ChatGPT Enterprise with BAA. Upload the claims file. ChatGPT writes Python code and analyzes it on OpenAI's servers. This works if your organization has a ChatGPT Enterprise BAA, the data handling procedures are documented, and your privacy officer approves the workflow.
Option B: Local browser tool with schema-only AI. Load the claims file into DuckDB-WASM in the browser. The AI sees: member_id (VARCHAR), diagnosis_code (VARCHAR), procedure_code (VARCHAR), billed_amount (DOUBLE), provider_name (VARCHAR). It generates:
SELECT
COUNT(DISTINCT member_id) AS members,
ROUND(SUM(billed_amount) / COUNT(DISTINCT member_id), 2) AS avg_cost_per_member
FROM claims
WHERE diagnosis_code LIKE 'E11%' -- ICD-10 for Type 2 Diabetes
The query runs locally. The AI never sees that member M-44821 had $12,450 in diabetes-related claims.
Option C: Local AI (Ollama). Same as Option B, but the AI also runs on your machine. Zero network requests. Air-gapped compliant.
Scenario 2: Clinical trial site performance
You're a CRO (Contract Research Organization) analyzing site enrollment rates across clinical trial locations. The dataset includes site names, investigators, patient counts, and enrollment timelines.
This is likely PHI if patient counts at specific sites could identify individuals (small sites with rare conditions). Schema-only AI avoids the issue: the AI knows there's a patient_count column but never sees that Site #47 enrolled 3 patients with a rare autoimmune condition.
Scenario 3: Population health dashboard
You need to build a dashboard showing hospitalization rates by zip code, age group, and chronic condition. The source data is a patient-level extract with PHI.
For the analysis phase (exploring the data, finding patterns), use a local-first tool. For the dashboard phase (sharing aggregated results), the output is de-identified by construction — you're showing population-level statistics, not individual records.
Practical guidance for healthcare data teams
Step 1: Classify your data
Before choosing a tool, classify the dataset:
- PHI: Contains any of the 18 HIPAA identifiers linked to health information. Requires HIPAA-compliant handling.
- De-identified: All 18 identifiers removed per Safe Harbor method. Can be analyzed with any tool.
- Limited Data Set: Some identifiers removed but dates and geographic data retained. Requires a Data Use Agreement (DUA) but not a full BAA.
Step 2: Choose your architecture
| Data Classification | Recommended Tool Architecture |
|---|---|
| PHI | Local-only (WASM + local AI) or enterprise cloud with BAA |
| Limited Data Set | Local-only or schema-only cloud AI (with DUA) |
| De-identified | Any tool |
Step 3: Document your workflow
HIPAA requires documented policies. For any AI data analysis tool, document:
- What data is loaded into the tool
- What data (if any) leaves the device
- What the AI model sees (full data, schema, nothing)
- Retention: how long data persists (browser memory = until tab is closed)
- Access controls: who can use the tool and with what data
Step 4: Involve your compliance team early
Don't surprise them. Bring a one-page summary of the tool's architecture, what data flows where, and your proposed policy. Compliance teams are much more receptive to "I've already thought about the risks" than "Can I use this cool AI thing?"
The bottom line
HIPAA doesn't prohibit AI-powered data analysis. It requires you to protect PHI wherever it goes. The simplest way to comply is to ensure PHI doesn't go anywhere.
Local-first tools — where data stays in your browser and AI only sees schema metadata — reduce the HIPAA compliance surface to near zero. They're not a silver bullet (you still need organizational policies, access controls, and training), but they remove the hardest part of the equation: getting BAAs signed with AI providers.
If you're on a healthcare data team, you don't have to choose between AI-powered analysis and HIPAA compliance. The architecture that makes both possible already exists.
QueryVeil runs DuckDB-WASM in the browser with schema-only AI — data never leaves your device. For healthcare data teams evaluating tools, the live demo shows the architecture in action without requiring any real data. No signup required.