TL;DR: DuckDB WebAssembly lets you run full SQL queries against CSV files directly in your browser tab. No Python, no database server, no uploading files anywhere. This tutorial walks through loading CSVs, writing queries, joins, aggregations, and common gotchas.
Why SQL on CSVs in the browser
You have a CSV. You need to answer a question. The traditional options:
- Excel/Google Sheets — Works for simple stuff. Falls apart with 100K+ rows, complex joins, or anything that needs a
GROUP BY. - Python/pandas — Powerful, but you need a Python environment set up.
- Upload to a database — Works, but now your data is on a server.
- Command-line DuckDB — Great if you're comfortable in a terminal.
There's a fifth option: run DuckDB in the browser via WebAssembly. The CSV loads from your local disk into browser memory. SQL runs against it. Nothing leaves your machine.
This matters if you're dealing with sensitive data, if you don't have Python installed, or if you just want the fastest path from file to answer.
How DuckDB-WASM works (30-second version)
DuckDB is a columnar SQL database designed for analytics. It compiles to WebAssembly (WASM), which means it runs inside your browser's JavaScript engine at near-native speed.
When you load a CSV, here's what happens:
- The browser's File API reads the file from your disk into memory.
- DuckDB-WASM parses the CSV, infers column types (integers, strings, dates, etc.), and creates an in-memory table.
- You write SQL. DuckDB executes it against the in-memory table.
- Results come back as rows and columns, ready to display.
No server involved at any step. The CSV goes from your disk to your browser tab — that's the entire data path.
Step 1: Load a CSV
For this tutorial, imagine we have a file called orders.csv:
order_id,customer_name,region,product,quantity,unit_price,order_date
1001,Alice Chen,West,Widget A,5,29.99,2025-11-15
1002,Bob Martinez,East,Widget B,2,49.99,2025-11-16
1003,Carol Johnson,West,Widget A,10,29.99,2025-11-16
1004,David Kim,Central,Widget C,1,99.99,2025-11-17
1005,Eve Patel,East,Widget B,7,49.99,2025-11-18
1006,Frank Liu,West,Widget C,3,99.99,2025-11-18
1007,Grace Obi,Central,Widget A,8,29.99,2025-11-19
1008,Hiro Tanaka,East,Widget B,4,49.99,2025-11-20
If you're using the DuckDB-WASM JavaScript API directly:
import * as duckdb from '@duckdb/duckdb-wasm';
// After initializing the database and connection:
await db.registerFileHandle('orders.csv', file, 2, true);
await conn.query(`
CREATE TABLE orders AS SELECT * FROM read_csv_auto('orders.csv')
`);
The read_csv_auto function handles type inference. It looks at the values in each column and decides: order_id is an integer, customer_name is a varchar, order_date is a date, and so on.
If you're using a tool like QueryVeil, you just drag the file in. Same thing happens under the hood.
Step 2: Basic queries
Once the table exists, it's standard SQL:
See what you're working with:
DESCRIBE orders;
Returns column names and inferred types. Always run this first — type inference isn't perfect, and catching a mistyped column early saves debugging later.
Count rows:
SELECT COUNT(*) FROM orders;
Filter rows:
SELECT * FROM orders
WHERE region = 'West'
ORDER BY order_date;
Aggregations:
SELECT region, SUM(quantity * unit_price) AS total_revenue
FROM orders
GROUP BY region
ORDER BY total_revenue DESC;
Result:
| region | total_revenue |
|---|---|
| West | 749.72 |
| East | 649.87 |
| Central | 339.91 |
This is the bread and butter — the same SQL you'd write against Postgres or BigQuery, running entirely in your browser.
Step 3: Joins across multiple CSVs
DuckDB-WASM can load multiple files into separate tables and join them. Say you have a second file, customers.csv:
customer_name,segment,signup_date
Alice Chen,Enterprise,2024-01-10
Bob Martinez,SMB,2024-03-22
Carol Johnson,Enterprise,2024-02-15
David Kim,Startup,2024-06-01
Eve Patel,SMB,2024-04-30
Frank Liu,Enterprise,2024-01-25
Grace Obi,Startup,2024-07-12
Hiro Tanaka,SMB,2024-05-18
Load it the same way, then join:
SELECT
c.segment,
COUNT(DISTINCT o.order_id) AS order_count,
SUM(o.quantity * o.unit_price) AS total_revenue,
ROUND(AVG(o.quantity * o.unit_price), 2) AS avg_order_value
FROM orders o
JOIN customers c ON o.customer_name = c.customer_name
GROUP BY c.segment
ORDER BY total_revenue DESC;
| segment | order_count | total_revenue | avg_order_value |
|---|---|---|---|
| Enterprise | 3 | 749.72 | 249.91 |
| SMB | 3 | 649.87 | 216.62 |
| Startup | 2 | 339.91 | 169.96 |
Multi-table analysis in the browser. No upload, no server.
Step 4: Window functions and CTEs
DuckDB supports the full range of analytical SQL. A common use case: running totals and rankings.
WITH daily_revenue AS (
SELECT
order_date,
SUM(quantity * unit_price) AS revenue
FROM orders
GROUP BY order_date
)
SELECT
order_date,
revenue,
SUM(revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
ROUND(revenue / SUM(revenue) OVER () * 100, 1) AS pct_of_total
FROM daily_revenue
ORDER BY order_date;
CTEs, window functions, OVER clauses — they all work. If you're coming from a data warehouse background, you won't feel limited.
Step 5: Handling messy CSVs
Real-world CSVs are messy. DuckDB handles a lot of edge cases, but here are common issues and fixes:
Wrong type inference:
DuckDB guessed order_id is an integer, but you have IDs like ORD-1001. Force the type:
CREATE TABLE orders AS
SELECT * FROM read_csv('orders.csv', columns={
'order_id': 'VARCHAR',
'quantity': 'INTEGER',
'unit_price': 'DOUBLE'
});
Different delimiters:
Tab-separated or semicolon-separated files:
SELECT * FROM read_csv('data.tsv', delim='\t');
SELECT * FROM read_csv('data.csv', delim=';');
Headers or no headers:
SELECT * FROM read_csv('data.csv', header=false);
Null handling:
Check for nulls before aggregating:
SELECT
COUNT(*) AS total_rows,
COUNT(unit_price) AS rows_with_price,
COUNT(*) - COUNT(unit_price) AS null_prices
FROM orders;
Performance: what to expect
DuckDB-WASM is fast for the browser, but it's still running in a browser tab. Rough benchmarks on a typical laptop:
| File size | Row count | Simple query | Aggregation | Notes |
|---|---|---|---|---|
| 1 MB | ~10K rows | <100ms | <100ms | Instant |
| 50 MB | ~500K rows | <500ms | <1s | Comfortable |
| 200 MB | ~2M rows | 1-3s | 2-5s | Noticeable pause |
| 500 MB+ | ~5M+ rows | 5-10s | 10-30s | Pushing browser limits |
For files over 200MB, consider converting to Parquet first (DuckDB CLI can do this: COPY (SELECT * FROM 'file.csv') TO 'file.parquet'). Parquet is columnar, so DuckDB only reads the columns your query needs.
When this approach falls short
Be honest about the limits:
- Files over ~1GB push browser memory too hard. Use DuckDB CLI or Python at that point.
- Complex ETL pipelines with dozens of transformation steps are better in Python/dbt.
- Real-time data needs a server. This is for file-based analysis.
- Collaborative analysis where multiple people need to query the same dataset simultaneously — you need a shared database.
This approach is best for: ad-hoc analysis of file exports under a few hundred MB, especially when the data is sensitive and you don't want to upload it.
QueryVeil wraps DuckDB-WASM with a UI: drag in a CSV, write SQL or ask questions in plain English, get auto-profiling and visualizations. Live demo with sample data, no signup needed.