How to Query CSV Files with SQL in the Browser

TL;DR: DuckDB WebAssembly lets you run full SQL queries against CSV files directly in your browser tab. No Python, no database server, no uploading files anywhere. This tutorial walks through loading CSVs, writing queries, joins, aggregations, and common gotchas.

Why SQL on CSVs in the browser

You have a CSV. You need to answer a question. The traditional options:

Excel/Google Sheets — Works for simple stuff. Falls apart with 100K+ rows, complex joins, or anything that needs a GROUP BY.
Python/pandas — Powerful, but you need a Python environment set up.
Upload to a database — Works, but now your data is on a server.
Command-line DuckDB — Great if you're comfortable in a terminal.

There's a fifth option: run DuckDB in the browser via WebAssembly. The CSV loads from your local disk into browser memory. SQL runs against it. Nothing leaves your machine.

This matters if you're dealing with sensitive data, if you don't have Python installed, or if you just want the fastest path from file to answer.

How DuckDB-WASM works (30-second version)

DuckDB is a columnar SQL database designed for analytics. It compiles to WebAssembly (WASM), which means it runs inside your browser's JavaScript engine at near-native speed.

When you load a CSV, here's what happens:

The browser's File API reads the file from your disk into memory.
DuckDB-WASM parses the CSV, infers column types (integers, strings, dates, etc.), and creates an in-memory table.
You write SQL. DuckDB executes it against the in-memory table.
Results come back as rows and columns, ready to display.

No server involved at any step. The CSV goes from your disk to your browser tab — that's the entire data path.

Step 1: Load a CSV

For this tutorial, imagine we have a file called orders.csv:

order_id,customer_name,region,product,quantity,unit_price,order_date
1001,Alice Chen,West,Widget A,5,29.99,2025-11-15
1002,Bob Martinez,East,Widget B,2,49.99,2025-11-16
1003,Carol Johnson,West,Widget A,10,29.99,2025-11-16
1004,David Kim,Central,Widget C,1,99.99,2025-11-17
1005,Eve Patel,East,Widget B,7,49.99,2025-11-18
1006,Frank Liu,West,Widget C,3,99.99,2025-11-18
1007,Grace Obi,Central,Widget A,8,29.99,2025-11-19
1008,Hiro Tanaka,East,Widget B,4,49.99,2025-11-20

If you're using the DuckDB-WASM JavaScript API directly:

import * as duckdb from '@duckdb/duckdb-wasm';

// After initializing the database and connection:
await db.registerFileHandle('orders.csv', file, 2, true);
await conn.query(`
  CREATE TABLE orders AS SELECT * FROM read_csv_auto('orders.csv')
`);

The read_csv_auto function handles type inference. It looks at the values in each column and decides: order_id is an integer, customer_name is a varchar, order_date is a date, and so on.

If you're using a tool like QueryVeil, you just drag the file in. Same thing happens under the hood.

Step 2: Basic queries

Once the table exists, it's standard SQL:

See what you're working with:

DESCRIBE orders;

Returns column names and inferred types. Always run this first — type inference isn't perfect, and catching a mistyped column early saves debugging later.

Count rows:

SELECT COUNT(*) FROM orders;

Filter rows:

SELECT * FROM orders
WHERE region = 'West'
ORDER BY order_date;

Aggregations:

SELECT region, SUM(quantity * unit_price) AS total_revenue
FROM orders
GROUP BY region
ORDER BY total_revenue DESC;

Result:

region	total_revenue
West	749.72
East	649.87
Central	339.91

This is the bread and butter — the same SQL you'd write against Postgres or BigQuery, running entirely in your browser.

Step 3: Joins across multiple CSVs

DuckDB-WASM can load multiple files into separate tables and join them. Say you have a second file, customers.csv:

customer_name,segment,signup_date
Alice Chen,Enterprise,2024-01-10
Bob Martinez,SMB,2024-03-22
Carol Johnson,Enterprise,2024-02-15
David Kim,Startup,2024-06-01
Eve Patel,SMB,2024-04-30
Frank Liu,Enterprise,2024-01-25
Grace Obi,Startup,2024-07-12
Hiro Tanaka,SMB,2024-05-18

Load it the same way, then join:

SELECT
    c.segment,
    COUNT(DISTINCT o.order_id) AS order_count,
    SUM(o.quantity * o.unit_price) AS total_revenue,
    ROUND(AVG(o.quantity * o.unit_price), 2) AS avg_order_value
FROM orders o
JOIN customers c ON o.customer_name = c.customer_name
GROUP BY c.segment
ORDER BY total_revenue DESC;

segment	order_count	total_revenue	avg_order_value
Enterprise	3	749.72	249.91
SMB	3	649.87	216.62
Startup	2	339.91	169.96

Multi-table analysis in the browser. No upload, no server.

Step 4: Window functions and CTEs

DuckDB supports the full range of analytical SQL. A common use case: running totals and rankings.

WITH daily_revenue AS (
    SELECT
        order_date,
        SUM(quantity * unit_price) AS revenue
    FROM orders
    GROUP BY order_date
)
SELECT
    order_date,
    revenue,
    SUM(revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
    ROUND(revenue / SUM(revenue) OVER () * 100, 1) AS pct_of_total
FROM daily_revenue
ORDER BY order_date;

CTEs, window functions, OVER clauses — they all work. If you're coming from a data warehouse background, you won't feel limited.

Step 5: Handling messy CSVs

Real-world CSVs are messy. DuckDB handles a lot of edge cases, but here are common issues and fixes:

Wrong type inference:

DuckDB guessed order_id is an integer, but you have IDs like ORD-1001. Force the type:

CREATE TABLE orders AS
SELECT * FROM read_csv('orders.csv', columns={
    'order_id': 'VARCHAR',
    'quantity': 'INTEGER',
    'unit_price': 'DOUBLE'
});

Different delimiters:

Tab-separated or semicolon-separated files:

SELECT * FROM read_csv('data.tsv', delim='\t');
SELECT * FROM read_csv('data.csv', delim=';');

Headers or no headers:

SELECT * FROM read_csv('data.csv', header=false);

Null handling:

Check for nulls before aggregating:

SELECT
    COUNT(*) AS total_rows,
    COUNT(unit_price) AS rows_with_price,
    COUNT(*) - COUNT(unit_price) AS null_prices
FROM orders;

Performance: what to expect

DuckDB-WASM is fast for the browser, but it's still running in a browser tab. Rough benchmarks on a typical laptop:

File size	Row count	Simple query	Aggregation	Notes
1 MB	~10K rows	<100ms	<100ms	Instant
50 MB	~500K rows	<500ms	<1s	Comfortable
200 MB	~2M rows	1-3s	2-5s	Noticeable pause
500 MB+	~5M+ rows	5-10s	10-30s	Pushing browser limits

For files over 200MB, consider converting to Parquet first (DuckDB CLI can do this: COPY (SELECT * FROM 'file.csv') TO 'file.parquet'). Parquet is columnar, so DuckDB only reads the columns your query needs.

When this approach falls short

Be honest about the limits:

Files over ~1GB push browser memory too hard. Use DuckDB CLI or Python at that point.
Complex ETL pipelines with dozens of transformation steps are better in Python/dbt.
Real-time data needs a server. This is for file-based analysis.
Collaborative analysis where multiple people need to query the same dataset simultaneously — you need a shared database.

This approach is best for: ad-hoc analysis of file exports under a few hundred MB, especially when the data is sensitive and you don't want to upload it.

QueryVeil wraps DuckDB-WASM with a UI: drag in a CSV, write SQL or ask questions in plain English, get auto-profiling and visualizations. Live demo with sample data, no signup needed.