HIPAA-Compliant Data Analysis: Can You Use AI?

TL;DR: You can use AI for healthcare data analysis, but most popular AI tools don't meet HIPAA requirements out of the box. The key question is whether your data qualifies as PHI and whether the tool is a "business associate" under HIPAA. Local-first tools that never transmit data sidestep the problem entirely — if the data never leaves your device, there's no covered transaction.

Disclaimer: This post is informational, not legal advice. HIPAA compliance involves organizational policies, technical controls, and legal agreements. Consult your compliance officer and legal counsel for decisions about your specific data and workflows.

The problem healthcare data teams face

You're a data analyst at a health system, a health tech startup, or a payer organization. You have a dataset — claims data, patient demographics, utilization reports, clinical trial metrics. You need to answer questions about it.

Your non-healthcare peers are dragging CSVs into ChatGPT and getting answers in seconds. You can't do that. Or at least, you shouldn't. Because your data almost certainly contains Protected Health Information (PHI), and uploading PHI to a cloud AI tool without a Business Associate Agreement (BAA) is a HIPAA violation.

So you're stuck with Excel pivot tables, slow SQL queries you write yourself, or waiting for IT to build you a dashboard. The AI revolution is happening everywhere except healthcare analytics.

It doesn't have to be this way.

HIPAA basics for data teams (the 2-minute version)

What is PHI?

Protected Health Information (PHI) is any individually identifiable health information held or transmitted by a covered entity or business associate. This includes:

Patient names, dates of birth, Social Security numbers
Medical record numbers, health plan beneficiary numbers
Diagnosis codes linked to identifiable individuals
Dates of service, admission, discharge
Any combination of demographic + health data that could identify a person

Key nuance: De-identified data is not PHI. HIPAA's Safe Harbor method requires removing 18 specific identifiers. But de-identification is hard to get right, and most operational datasets haven't been through the process.

What is a Business Associate?

A Business Associate (BA) is any entity that creates, receives, maintains, or transmits PHI on behalf of a covered entity. If you upload PHI to a cloud tool, that tool's vendor is a business associate.

What is a BAA?

A Business Associate Agreement (BAA) is a contract between a covered entity and a business associate that establishes permitted uses of PHI, safeguards, and breach notification obligations. Without a BAA, transmitting PHI to a vendor is a violation — regardless of how secure the vendor's infrastructure is.

The practical implication

To use a cloud-based AI tool with PHI, you need:

The vendor to sign a BAA
The vendor's infrastructure to meet HIPAA security requirements
Any upstream AI providers (OpenAI, Anthropic, Google) to also have BAAs in place
Appropriate access controls, audit logging, and encryption

This is a high bar. Most AI tools don't clear it.

Which AI data tools offer BAAs?

Here's the current landscape as of early 2026:

Tool	BAA Available?	Notes
ChatGPT Enterprise	Yes	Via OpenAI Enterprise agreement. Does not cover ChatGPT consumer or Plus plans.
Google Cloud AI / Vertex AI	Yes	Via Google Cloud BAA. Requires Workspace Enterprise and specific configuration.
Microsoft Azure OpenAI	Yes	Via Microsoft Azure BAA. Not available through consumer-facing products.
AWS Bedrock	Yes	Via AWS BAA. Covers models hosted on Bedrock.
Julius AI	No	No BAA offered as of early 2026.
ChatGPT Consumer/Plus	No	Explicitly not HIPAA-eligible.
Google Sheets + Gemini	Depends	Google Workspace Enterprise with BAA covers Sheets. Gemini features may not be covered.
Jupyter (local)	N/A	No data transmission — no BA relationship.
Browser-local tools (WASM)	N/A*	If data never leaves the device, there may be no BA relationship to cover.

*The "N/A" for browser-local tools is the interesting case. We'll dig into this.

The local-first argument for HIPAA

Here's where the architecture matters.

HIPAA's Business Associate rules apply when PHI is transmitted to or maintained by a third party. If your data analysis tool runs entirely on your device — in the browser via WebAssembly, for example — and the data never leaves that device, then:

No PHI is transmitted to the tool vendor
No PHI is maintained by the tool vendor
The vendor may not be a Business Associate for that specific interaction

This is the same reason you don't need a BAA with Microsoft for using Excel on your laptop (you do need one for Office 365 cloud features, but not for local computation).

But what about the AI component?

This is where it gets nuanced. If the tool uses AI, you need to ask: What does the AI see?

If the AI sees PHI (full data access): The AI provider is receiving PHI. You need a BAA with both the tool vendor and the upstream AI provider. This is the ChatGPT Advanced Data Analysis model — the AI reads your data.

If the AI sees only schema (column names and types): The AI receives metadata, not PHI. Column names like patient_id, diagnosis_code, admission_date are schema information — they describe the structure of the data, not individual patient records.

However: Whether schema metadata constitutes PHI is a gray area. A column named hiv_status in a table called clinic_patients reveals that the covered entity collects HIV status data. This isn't individually identifiable, but your compliance team may still flag it.

If the AI is local (WebLLM, Ollama): No data or metadata leaves the device. This is the cleanest path from a compliance perspective.

A practical decision matrix

AI Architecture	PHI transmitted?	BAA needed with AI provider?	HIPAA risk level
Full cloud (AI sees all data)	Yes	Yes	High — requires full compliance chain
Schema-only cloud AI	Metadata only	Possibly — consult legal	Low to Medium — depends on schema sensitivity
Local AI (WebLLM/Ollama)	No	No	Lowest — no transmission
No AI (manual SQL)	No	N/A	Lowest — no third party involvement

Real-world scenarios

Scenario 1: Claims analysis for a health plan

You have a dataset of medical claims — member IDs, diagnosis codes, procedure codes, billed amounts, provider names. Your VP asks: "What's the average cost per member for diabetes-related claims?"

Option A: ChatGPT Enterprise with BAA. Upload the claims file. ChatGPT writes Python code and analyzes it on OpenAI's servers. This works if your organization has a ChatGPT Enterprise BAA, the data handling procedures are documented, and your privacy officer approves the workflow.

Option B: Local browser tool with schema-only AI. Load the claims file into DuckDB-WASM in the browser. The AI sees: member_id (VARCHAR), diagnosis_code (VARCHAR), procedure_code (VARCHAR), billed_amount (DOUBLE), provider_name (VARCHAR). It generates:

SELECT
    COUNT(DISTINCT member_id) AS members,
    ROUND(SUM(billed_amount) / COUNT(DISTINCT member_id), 2) AS avg_cost_per_member
FROM claims
WHERE diagnosis_code LIKE 'E11%'  -- ICD-10 for Type 2 Diabetes

The query runs locally. The AI never sees that member M-44821 had $12,450 in diabetes-related claims.

Option C: Local AI (Ollama). Same as Option B, but the AI also runs on your machine. Zero network requests. Air-gapped compliant.

Scenario 2: Clinical trial site performance

You're a CRO (Contract Research Organization) analyzing site enrollment rates across clinical trial locations. The dataset includes site names, investigators, patient counts, and enrollment timelines.

This is likely PHI if patient counts at specific sites could identify individuals (small sites with rare conditions). Schema-only AI avoids the issue: the AI knows there's a patient_count column but never sees that Site #47 enrolled 3 patients with a rare autoimmune condition.

Scenario 3: Population health dashboard

You need to build a dashboard showing hospitalization rates by zip code, age group, and chronic condition. The source data is a patient-level extract with PHI.

For the analysis phase (exploring the data, finding patterns), use a local-first tool. For the dashboard phase (sharing aggregated results), the output is de-identified by construction — you're showing population-level statistics, not individual records.

Practical guidance for healthcare data teams

Step 1: Classify your data

Before choosing a tool, classify the dataset:

PHI: Contains any of the 18 HIPAA identifiers linked to health information. Requires HIPAA-compliant handling.
De-identified: All 18 identifiers removed per Safe Harbor method. Can be analyzed with any tool.
Limited Data Set: Some identifiers removed but dates and geographic data retained. Requires a Data Use Agreement (DUA) but not a full BAA.

Step 2: Choose your architecture

Data Classification	Recommended Tool Architecture
PHI	Local-only (WASM + local AI) or enterprise cloud with BAA
Limited Data Set	Local-only or schema-only cloud AI (with DUA)
De-identified	Any tool

Step 3: Document your workflow

HIPAA requires documented policies. For any AI data analysis tool, document:

What data is loaded into the tool
What data (if any) leaves the device
What the AI model sees (full data, schema, nothing)
Retention: how long data persists (browser memory = until tab is closed)
Access controls: who can use the tool and with what data

Step 4: Involve your compliance team early

Don't surprise them. Bring a one-page summary of the tool's architecture, what data flows where, and your proposed policy. Compliance teams are much more receptive to "I've already thought about the risks" than "Can I use this cool AI thing?"

The bottom line

HIPAA doesn't prohibit AI-powered data analysis. It requires you to protect PHI wherever it goes. The simplest way to comply is to ensure PHI doesn't go anywhere.

Local-first tools — where data stays in your browser and AI only sees schema metadata — reduce the HIPAA compliance surface to near zero. They're not a silver bullet (you still need organizational policies, access controls, and training), but they remove the hardest part of the equation: getting BAAs signed with AI providers.

If you're on a healthcare data team, you don't have to choose between AI-powered analysis and HIPAA compliance. The architecture that makes both possible already exists.

QueryVeil runs DuckDB-WASM in the browser with schema-only AI — data never leaves your device. For healthcare data teams evaluating tools, the live demo shows the architecture in action without requiring any real data. No signup required.