What Your AI Data Tool Actually Does with Your CSV

TL;DR: When you upload a CSV to most AI data tools, your file goes to their servers, the AI reads every row, and retention policies vary wildly. We traced what actually happens with several popular tools using browser DevTools and published documentation. The results might change how you work with sensitive data.

The experiment

I took a simple CSV file — 500 rows of fake e-commerce data — and uploaded it to five popular AI data analysis tools. For each tool, I opened browser DevTools before uploading, watched the Network tab, and documented exactly what happened.

Then I read each tool's terms of service, data processing agreements, and privacy policies to understand what happens after the upload.

This isn't about calling out specific companies. These are legitimate products built by good teams. The point is: most analysts don't know what happens to their data, and the defaults are rarely optimized for privacy.

Tool 1: ChatGPT (Advanced Data Analysis)

What I did: Uploaded orders.csv and asked "What's the average order value by region?"

What the Network tab showed:

A multipart form upload to https://chatgpt.com/backend-api/conversation. The entire CSV was in the request payload. The file was transmitted to OpenAI's servers.

What happened next:

ChatGPT spun up a sandboxed Python environment, wrote pandas code, executed it, and returned a table with the results. The model had full access to every row in the file.

Retention policy (from OpenAI's docs):

ChatGPT consumer: Conversations may be used to improve models unless you opt out in settings. Files are retained "for the duration of the conversation" but the exact deletion timeline is ambiguous.
ChatGPT Enterprise/Team: Data is not used for training. Files are deleted after the session.
API: Data is retained for 30 days for abuse monitoring, not used for training.

Key finding: On the consumer plan, your CSV data may contribute to model training unless you manually opt out. Most people don't. And the distinction between "consumer" and "Enterprise" plans matters enormously here.

Tool 2: Julius AI

What I did: Uploaded the same CSV and asked the same question.

What the Network tab showed:

The file was uploaded via a POST request to Julius's servers. A websocket connection then streamed back the analysis results.

What happened next:

Julius ran Python code against the full dataset on their servers. The AI had complete access to all rows and columns.

Retention policy (from their docs):

Julius states that uploaded files are used only for the current analysis session. Their privacy policy says data may be stored "as long as necessary to provide the service." There's no SOC 2 certification listed publicly. No BAA available for healthcare data.

Key finding: The privacy policy language is vague enough to give a security team pause. "As long as necessary" could mean minutes or months.

Tool 3: Google Sheets + Gemini

What I did: Imported the CSV into Google Sheets and used the "Help me analyze" Gemini sidebar.

What the Network tab showed:

The file was already in Google's infrastructure once imported into Sheets. When I triggered Gemini, additional requests were sent to Google's AI APIs with spreadsheet content.

What happened next:

Gemini analyzed the data and returned insights. The model had access to the spreadsheet data within Google's ecosystem.

Retention policy (from Google's docs):

Google Workspace data policies apply. For consumer accounts, data may be used to improve services. For Workspace Enterprise accounts, Google states that customer data is not used for advertising or training AI models. Gemini in Workspace has a separate data processing framework.

Key finding: If you're on a Google Workspace enterprise plan with proper DPA, this is relatively well-documented. On a personal Gmail account, the picture is murkier.

Tool 4: A Python notebook (Jupyter, local)

What I did: Opened Jupyter locally, loaded the CSV with pandas, wrote a query.

What the Network tab showed:

Network requests only to localhost:8888 (the local Jupyter server). The CSV was read from disk by the Python process running on my machine. No external network calls.

What happened next:

Pandas processed the data locally. No AI involved. I wrote the code myself.

Retention policy: Entirely in my control. The file stays on my disk. The notebook is a local file. Nothing is transmitted.

Key finding: Maximum privacy, maximum friction. Writing Python for every ad-hoc question is slow. This is the baseline against which we should compare other tools.

Tool 5: A browser-based DuckDB-WASM tool

What I did: Opened a browser-based tool running DuckDB WebAssembly, dragged in the CSV, and asked a question in natural language.

What the Network tab showed:

The file load generated zero network requests. The CSV was read from disk via the File API into browser memory. When I asked a question, a small request was sent to an AI API containing only the table schema:

{
  "messages": [
    {
      "role": "system",
      "content": "Table: orders\nColumns: order_id (INTEGER), customer_name (VARCHAR), region (VARCHAR), product (VARCHAR), quantity (INTEGER), unit_price (DOUBLE), order_date (DATE)"
    },
    {
      "role": "user",
      "content": "What's the average order value by region?"
    }
  ]
}

No row data. No customer names. No revenue figures. Just column names and types.

What happened next:

The AI returned a SQL query. DuckDB-WASM executed it in the browser. Results rendered locally.

Retention policy: The AI provider (e.g., OpenRouter routing to Claude or GPT-4) retains the prompt per their policy — but the prompt only contains schema metadata. The actual data never left the browser.

Key finding: This architecture separates AI capability from data access. The AI helps write queries without ever seeing the data those queries run against.

Summary matrix

Tool	File uploaded to server?	AI sees full data?	Retention clarity	Training data risk	Verifiable via DevTools?
ChatGPT ADA	Yes	Yes	Medium (plan-dependent)	Yes (consumer plan)	Yes
Julius AI	Yes	Yes	Low (vague policy)	Unclear	Yes
Google Sheets + Gemini	Yes (Google infra)	Yes	High (enterprise) / Low (consumer)	Plan-dependent	Partially
Jupyter (local)	No	No AI	N/A (you control)	No	Yes
Browser WASM + schema AI	No	Schema only	High (only schema sent)	Schema only	Yes

How to check any tool yourself

You don't need to take my word — or any vendor's word — for it. Here's how to verify:

Step 1: Open browser DevTools before uploading

In Chrome: right-click > Inspect > Network tab. In Firefox: right-click > Inspect > Network tab. Clear the log so you start fresh.

Step 2: Upload your file

Watch the Network tab. Look for:

POST requests with large payloads — that's your file being uploaded
WebSocket connections — data might be streaming to a server
Requests to third-party domains — your data might be going to an AI provider you didn't expect

Step 3: Ask a question

Watch for new requests. Check the request payload:

Does it contain your actual data values? (Names, numbers, dates from your CSV)
Or does it contain only metadata? (Column names, types, table structure)

Step 4: Check the response

Is the computation result coming from the server (meaning it ran on their infrastructure) or is JavaScript running a local query engine?

Step 5: Read the fine print

For any tool that sends data to a server:

Find the privacy policy. Search for "retention," "training," and "third party."
Find the terms of service. Search for "data," "license," and "use."
If there's a DPA, read it. It often contradicts the marketing page.

What this means for your workflow

If you're analyzing public datasets, benchmark data, or anything you'd publish on a blog — use whatever tool is most convenient. ChatGPT is genuinely great for this.

If you're analyzing data that contains:

Customer PII (names, emails, phone numbers)
Financial records (revenue, costs, margins)
Healthcare data (any PHI)
HR data (salaries, performance reviews)
Anything under NDA

Then you should know exactly where that data goes when you drag it into a tool. Open DevTools. Check the network requests. Read the retention policy. And consider whether a local-first approach — where the data never leaves your machine — is the better default.

The best tool isn't the one with the best AI model. It's the one whose architecture matches your data sensitivity.

QueryVeil is built on the browser-local architecture described in Tool 5. Schema-only AI, DuckDB-WASM, no file upload. Open DevTools and verify. Live demo with sample data, no signup required.

What Your AI Data Tool Actually Does with Your CSV

The experiment

Tool 1: ChatGPT (Advanced Data Analysis)

Tool 2: Julius AI

Tool 3: Google Sheets + Gemini

Tool 4: A Python notebook (Jupyter, local)

Tool 5: A browser-based DuckDB-WASM tool

Summary matrix

How to check any tool yourself

Step 1: Open browser DevTools before uploading

Step 2: Upload your file

Step 3: Ask a question

Step 4: Check the response

Step 5: Read the fine print

What this means for your workflow

See this in action

Analyze your data without uploading it