Back to blog

From CSV to Insight in 30 Seconds: A Walkthrough

QueryVeil Team··7 min read
tutorialwalkthroughcsvdata-analysisaisql

TL;DR: We take a realistic e-commerce sales CSV, load it into a browser-based SQL engine, auto-profile it, ask five questions in plain English, and drill down to a non-obvious insight — all in under 30 seconds of active work. This walkthrough shows the full flow with actual SQL and results.


The dataset

We're working with ecommerce_sales.csv — 10,000 orders from a fictional online store. Columns:

ColumnTypeExample
order_idINTEGER48291
order_dateDATE2025-11-15
customer_idVARCHARCUST-7742
customer_segmentVARCHAREnterprise
regionVARCHARWest
product_categoryVARCHARElectronics
product_nameVARCHARWireless Headphones
quantityINTEGER3
unit_priceDOUBLE79.99
discount_pctDOUBLE0.10
shipping_methodVARCHARExpress

This is representative of what analysts actually work with: a CRM or Shopify export, a data warehouse extract, or an operational report.

Second 0-5: Load the file

Drag ecommerce_sales.csv into the browser. DuckDB-WASM parses it, infers column types, and creates an in-memory table.

What happens under the hood:

  1. The browser's File API reads the file from disk
  2. DuckDB-WASM samples rows to infer types (order_date becomes DATE, unit_price becomes DOUBLE, etc.)
  3. The full CSV is parsed into a columnar in-memory table

For a 10K-row CSV (~2MB), this takes under a second. No network request, no upload — the file goes from your disk to browser memory.

Second 5-10: Auto-profile

Before you ask a single question, auto-profiling runs a battery of SQL queries to characterize the dataset:

-- Row count
SELECT COUNT(*) FROM ecommerce_sales;
-- Result: 10,000

-- Column stats
SELECT
    COUNT(DISTINCT customer_segment) as segments,
    COUNT(DISTINCT region) as regions,
    COUNT(DISTINCT product_category) as categories
FROM ecommerce_sales;
-- Result: 4 segments, 5 regions, 8 categories

-- Null rates
SELECT
    ROUND(100.0 * COUNT(*) FILTER (WHERE discount_pct IS NULL) / COUNT(*), 1) as discount_null_pct,
    ROUND(100.0 * COUNT(*) FILTER (WHERE shipping_method IS NULL) / COUNT(*), 1) as shipping_null_pct
FROM ecommerce_sales;
-- Result: discount_null_pct = 12.3%, shipping_null_pct = 0.0%

-- Numeric distributions
SELECT
    MIN(unit_price), MAX(unit_price), ROUND(AVG(unit_price), 2),
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY unit_price), 2) as median_price
FROM ecommerce_sales;
-- Result: min=4.99, max=499.99, avg=87.42, median=62.99

-- Date range
SELECT MIN(order_date), MAX(order_date)
FROM ecommerce_sales;
-- Result: 2025-01-01 to 2025-12-31

The profile tells you:

  • 10K orders across a full year
  • 4 customer segments, 5 regions, 8 product categories
  • 12.3% of orders have no discount (null) — worth noting for any revenue analysis
  • Prices range from $4.99 to $499.99, median $62.99

This takes 2-3 seconds. You already know more about this dataset than most analysts would after 5 minutes of scrolling through the spreadsheet.

Second 10-15: Question 1 — "What's our revenue by region?"

You type: "What's the total revenue by region?"

The AI sees your schema (column names and types) and generates:

SELECT
    region,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT order_id) AS order_count
FROM ecommerce_sales
GROUP BY region
ORDER BY net_revenue DESC;

Note: the AI correctly handles the discount and null values in discount_pct — this is where schema-only AI with profiling metadata earns its keep.

regionnet_revenueorder_count
West284,112.452,847
East241,890.332,412
Central178,445.211,789
South156,220.181,563
North121,887.901,389

West leads by a significant margin — both in revenue and order volume.

Second 15-18: Question 2 — "Which segment drives the most revenue?"

You type: "Break down revenue by customer segment."

SELECT
    customer_segment,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))) /
          COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM ecommerce_sales
GROUP BY customer_segment
ORDER BY net_revenue DESC;
customer_segmentnet_revenueunique_customersrevenue_per_customer
Enterprise412,340.223121,321.60
Mid-Market234,112.05487480.72
SMB198,220.801,245159.21
Consumer137,883.002,89147.69

Enterprise customers are 6.8% of the customer base but generate 42% of revenue. Revenue per customer drops by 27x from Enterprise to Consumer.

Second 18-22: Question 3 — "What's the trend over time?"

You type: "Show me monthly revenue trend."

SELECT
    DATE_TRUNC('month', order_date) AS month,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
    COUNT(DISTINCT order_id) AS orders
FROM ecommerce_sales
GROUP BY month
ORDER BY month;
monthnet_revenueorders
2025-0172,445.12724
2025-0268,112.33681
2025-0374,890.45749
.........
2025-1089,220.18892
2025-11102,445.671,024
2025-12118,890.331,189

Revenue trending up in Q4, with December being the strongest month. Typical e-commerce seasonality. But wait —

Second 22-26: Question 4 — The drill-down

You notice the Q4 spike and ask: "Is the Q4 revenue growth coming from more orders or higher order values?"

SELECT
    CASE
        WHEN order_date >= '2025-10-01' THEN 'Q4'
        ELSE 'Q1-Q3'
    END AS period,
    COUNT(DISTINCT order_id) AS total_orders,
    ROUND(AVG(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS avg_order_value,
    ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount
FROM ecommerce_sales
GROUP BY period;
periodtotal_ordersavg_order_valueavg_discount
Q1-Q36,89591.220.082
Q43,105112.450.142

Interesting. Q4 has higher average order values, but also nearly double the discount rate. The revenue growth is driven by volume and higher-value products, but the margins might be thinner.

Second 26-30: Question 5 — The non-obvious insight

This prompts a follow-up: "Which product categories are being discounted most heavily in Q4?"

SELECT
    product_category,
    COUNT(*) AS q4_orders,
    ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount,
    ROUND(SUM(quantity * unit_price * discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 2) AS total_discount_given,
    ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue
FROM ecommerce_sales
WHERE order_date >= '2025-10-01'
GROUP BY product_category
ORDER BY avg_discount DESC;
product_categoryq4_ordersavg_discounttotal_discount_givennet_revenue
Electronics4870.19818,445.2274,220.33
Furniture3120.1729,112.4543,890.12
Clothing5340.1567,890.3342,667.80
Office Supplies4230.1124,220.1833,445.67
...............

The insight: Electronics is driving Q4 revenue with the highest order volume, but it's also being discounted at nearly 20% — double the company average. The $18K in discounts given on Electronics alone is significant. If this is intentional (holiday promotions), fine. If it's sales reps giving discretionary discounts to close deals, it's a margin problem worth investigating.

That's the kind of non-obvious finding that takes 30 minutes of pivot-table wrestling in Excel. Here, it took 5 questions in plain English.

What just happened

Let's recap the 30-second flow:

  1. Load — File goes from disk to browser memory. No upload.
  2. Profile — Auto-generated stats give you a map of the dataset before you ask anything.
  3. Explore — Natural language questions become SQL that runs locally. Each answer prompts the next question.
  4. Drill down — Follow the thread from "what" to "why." The AI chains queries that build on previous context.
  5. Find the insight — A non-obvious margin issue hiding behind strong top-line revenue growth.

The AI generated 5 SQL queries. DuckDB executed them locally. The AI never saw that Electronics had a 19.8% discount rate or that the West region generated $284K. It just wrote the SQL. Your machine answered it.

Try it yourself

The dataset above is fictional, but the workflow is real. You can replicate this with any CSV you have:

  1. Open a browser-based DuckDB-WASM tool
  2. Drag in your file
  3. Read the auto-profile
  4. Ask your first question
  5. Follow the thread

The best insights come from the third or fourth follow-up question — the ones you wouldn't have thought to ask before seeing the first results.


QueryVeil runs this exact workflow: DuckDB-WASM in the browser, auto-profiling, natural language to SQL, multi-step drill-downs. The live demo has sample data loaded if you want to try it without your own file. No signup required.

See this in action

Drop a CSV into QueryVeil and run your first query in seconds. No signup, no upload — your data stays in your browser.

Try the live demo

Analyze your data without uploading it

DuckDB runs in your browser. AI sees your schema, not your data. Try the demo with sample data or bring your own file.