From CSV to Insight in 30 Seconds: A Walkthrough

TL;DR: We take a realistic e-commerce sales CSV, load it into a browser-based SQL engine, auto-profile it, ask five questions in plain English, and drill down to a non-obvious insight — all in under 30 seconds of active work. This walkthrough shows the full flow with actual SQL and results.

The dataset

We're working with ecommerce_sales.csv — 10,000 orders from a fictional online store. Columns:

Column	Type	Example
order_id	INTEGER	48291
order_date	DATE	2025-11-15
customer_id	VARCHAR	CUST-7742
customer_segment	VARCHAR	Enterprise
region	VARCHAR	West
product_category	VARCHAR	Electronics
product_name	VARCHAR	Wireless Headphones
quantity	INTEGER	3
unit_price	DOUBLE	79.99
discount_pct	DOUBLE	0.10
shipping_method	VARCHAR	Express

This is representative of what analysts actually work with: a CRM or Shopify export, a data warehouse extract, or an operational report.

Second 0-5: Load the file

Drag ecommerce_sales.csv into the browser. DuckDB-WASM parses it, infers column types, and creates an in-memory table.

What happens under the hood:

The browser's File API reads the file from disk
DuckDB-WASM samples rows to infer types (order_date becomes DATE, unit_price becomes DOUBLE, etc.)
The full CSV is parsed into a columnar in-memory table

For a 10K-row CSV (~2MB), this takes under a second. No network request, no upload — the file goes from your disk to browser memory.

Second 5-10: Auto-profile

Before you ask a single question, auto-profiling runs a battery of SQL queries to characterize the dataset:

-- Row count
SELECT COUNT(*) FROM ecommerce_sales;
-- Result: 10,000

-- Column stats
SELECT
    COUNT(DISTINCT customer_segment) as segments,
    COUNT(DISTINCT region) as regions,
    COUNT(DISTINCT product_category) as categories
FROM ecommerce_sales;
-- Result: 4 segments, 5 regions, 8 categories

-- Null rates
SELECT
    ROUND(100.0 * COUNT(*) FILTER (WHERE discount_pct IS NULL) / COUNT(*), 1) as discount_null_pct,
    ROUND(100.0 * COUNT(*) FILTER (WHERE shipping_method IS NULL) / COUNT(*), 1) as shipping_null_pct
FROM ecommerce_sales;
-- Result: discount_null_pct = 12.3%, shipping_null_pct = 0.0%

-- Numeric distributions
SELECT
    MIN(unit_price), MAX(unit_price), ROUND(AVG(unit_price), 2),
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY unit_price), 2) as median_price
FROM ecommerce_sales;
-- Result: min=4.99, max=499.99, avg=87.42, median=62.99

-- Date range
SELECT MIN(order_date), MAX(order_date)
FROM ecommerce_sales;
-- Result: 2025-01-01 to 2025-12-31

The profile tells you:

10K orders across a full year
4 customer segments, 5 regions, 8 product categories
12.3% of orders have no discount (null) — worth noting for any revenue analysis
Prices range from $4.99 to $499.99, median $62.99

This takes 2-3 seconds. You already know more about this dataset than most analysts would after 5 minutes of scrolling through the spreadsheet.

Second 10-15: Question 1 — "What's our revenue by region?"

You type: "What's the total revenue by region?"

The AI sees your schema (column names and types) and generates:

SELECT
    region,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT order_id) AS order_count
FROM ecommerce_sales
GROUP BY region
ORDER BY net_revenue DESC;

Note: the AI correctly handles the discount and null values in discount_pct — this is where schema-only AI with profiling metadata earns its keep.

region	net_revenue	order_count
West	284,112.45	2,847
East	241,890.33	2,412
Central	178,445.21	1,789
South	156,220.18	1,563
North	121,887.90	1,389

West leads by a significant margin — both in revenue and order volume.

Second 15-18: Question 2 — "Which segment drives the most revenue?"

You type: "Break down revenue by customer segment."

SELECT
    customer_segment,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))) /
          COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM ecommerce_sales
GROUP BY customer_segment
ORDER BY net_revenue DESC;

customer_segment	net_revenue	unique_customers	revenue_per_customer
Enterprise	412,340.22	312	1,321.60
Mid-Market	234,112.05	487	480.72
SMB	198,220.80	1,245	159.21
Consumer	137,883.00	2,891	47.69

Enterprise customers are 6.8% of the customer base but generate 42% of revenue. Revenue per customer drops by 27x from Enterprise to Consumer.

Second 18-22: Question 3 — "What's the trend over time?"

You type: "Show me monthly revenue trend."

SELECT
    DATE_TRUNC('month', order_date) AS month,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT order_id) AS orders
FROM ecommerce_sales
GROUP BY month
ORDER BY month;

month	net_revenue	orders
2025-01	72,445.12	724
2025-02	68,112.33	681
2025-03	74,890.45	749
...	...	...
2025-10	89,220.18	892
2025-11	102,445.67	1,024
2025-12	118,890.33	1,189

Revenue trending up in Q4, with December being the strongest month. Typical e-commerce seasonality. But wait —

Second 22-26: Question 4 — The drill-down

You notice the Q4 spike and ask: "Is the Q4 revenue growth coming from more orders or higher order values?"

SELECT
    CASE
        WHEN order_date >= '2025-10-01' THEN 'Q4'
        ELSE 'Q1-Q3'
    END AS period,
    COUNT(DISTINCT order_id) AS total_orders,
    ROUND(AVG(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS avg_order_value,
    ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount
FROM ecommerce_sales
GROUP BY period;

period	total_orders	avg_order_value	avg_discount
Q1-Q3	6,895	91.22	0.082
Q4	3,105	112.45	0.142

Interesting. Q4 has higher average order values, but also nearly double the discount rate. The revenue growth is driven by volume and higher-value products, but the margins might be thinner.

Second 26-30: Question 5 — The non-obvious insight

This prompts a follow-up: "Which product categories are being discounted most heavily in Q4?"

SELECT
    product_category,
    COUNT(*) AS q4_orders,
    ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount,
    ROUND(SUM(quantity * unit_price * discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 2) AS total_discount_given,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue
FROM ecommerce_sales
WHERE order_date >= '2025-10-01'
GROUP BY product_category
ORDER BY avg_discount DESC;

product_category	q4_orders	avg_discount	total_discount_given	net_revenue
Electronics	487	0.198	18,445.22	74,220.33
Furniture	312	0.172	9,112.45	43,890.12
Clothing	534	0.156	7,890.33	42,667.80
Office Supplies	423	0.112	4,220.18	33,445.67
...	...	...	...	...

The insight: Electronics is driving Q4 revenue with the highest order volume, but it's also being discounted at nearly 20% — double the company average. The $18K in discounts given on Electronics alone is significant. If this is intentional (holiday promotions), fine. If it's sales reps giving discretionary discounts to close deals, it's a margin problem worth investigating.

That's the kind of non-obvious finding that takes 30 minutes of pivot-table wrestling in Excel. Here, it took 5 questions in plain English.

What just happened

Let's recap the 30-second flow:

Load — File goes from disk to browser memory. No upload.
Profile — Auto-generated stats give you a map of the dataset before you ask anything.
Explore — Natural language questions become SQL that runs locally. Each answer prompts the next question.
Drill down — Follow the thread from "what" to "why." The AI chains queries that build on previous context.
Find the insight — A non-obvious margin issue hiding behind strong top-line revenue growth.

The AI generated 5 SQL queries. DuckDB executed them locally. The AI never saw that Electronics had a 19.8% discount rate or that the West region generated $284K. It just wrote the SQL. Your machine answered it.

Try it yourself

The dataset above is fictional, but the workflow is real. You can replicate this with any CSV you have:

Open a browser-based DuckDB-WASM tool
Drag in your file
Read the auto-profile
Ask your first question
Follow the thread

The best insights come from the third or fourth follow-up question — the ones you wouldn't have thought to ask before seeing the first results.

QueryVeil runs this exact workflow: DuckDB-WASM in the browser, auto-profiling, natural language to SQL, multi-step drill-downs. The live demo has sample data loaded if you want to try it without your own file. No signup required.