TL;DR: We take a realistic e-commerce sales CSV, load it into a browser-based SQL engine, auto-profile it, ask five questions in plain English, and drill down to a non-obvious insight — all in under 30 seconds of active work. This walkthrough shows the full flow with actual SQL and results.
The dataset
We're working with ecommerce_sales.csv — 10,000 orders from a fictional online store. Columns:
| Column | Type | Example |
|---|---|---|
| order_id | INTEGER | 48291 |
| order_date | DATE | 2025-11-15 |
| customer_id | VARCHAR | CUST-7742 |
| customer_segment | VARCHAR | Enterprise |
| region | VARCHAR | West |
| product_category | VARCHAR | Electronics |
| product_name | VARCHAR | Wireless Headphones |
| quantity | INTEGER | 3 |
| unit_price | DOUBLE | 79.99 |
| discount_pct | DOUBLE | 0.10 |
| shipping_method | VARCHAR | Express |
This is representative of what analysts actually work with: a CRM or Shopify export, a data warehouse extract, or an operational report.
Second 0-5: Load the file
Drag ecommerce_sales.csv into the browser. DuckDB-WASM parses it, infers column types, and creates an in-memory table.
What happens under the hood:
- The browser's File API reads the file from disk
- DuckDB-WASM samples rows to infer types (
order_datebecomes DATE,unit_pricebecomes DOUBLE, etc.) - The full CSV is parsed into a columnar in-memory table
For a 10K-row CSV (~2MB), this takes under a second. No network request, no upload — the file goes from your disk to browser memory.
Second 5-10: Auto-profile
Before you ask a single question, auto-profiling runs a battery of SQL queries to characterize the dataset:
-- Row count
SELECT COUNT(*) FROM ecommerce_sales;
-- Result: 10,000
-- Column stats
SELECT
COUNT(DISTINCT customer_segment) as segments,
COUNT(DISTINCT region) as regions,
COUNT(DISTINCT product_category) as categories
FROM ecommerce_sales;
-- Result: 4 segments, 5 regions, 8 categories
-- Null rates
SELECT
ROUND(100.0 * COUNT(*) FILTER (WHERE discount_pct IS NULL) / COUNT(*), 1) as discount_null_pct,
ROUND(100.0 * COUNT(*) FILTER (WHERE shipping_method IS NULL) / COUNT(*), 1) as shipping_null_pct
FROM ecommerce_sales;
-- Result: discount_null_pct = 12.3%, shipping_null_pct = 0.0%
-- Numeric distributions
SELECT
MIN(unit_price), MAX(unit_price), ROUND(AVG(unit_price), 2),
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY unit_price), 2) as median_price
FROM ecommerce_sales;
-- Result: min=4.99, max=499.99, avg=87.42, median=62.99
-- Date range
SELECT MIN(order_date), MAX(order_date)
FROM ecommerce_sales;
-- Result: 2025-01-01 to 2025-12-31
The profile tells you:
- 10K orders across a full year
- 4 customer segments, 5 regions, 8 product categories
- 12.3% of orders have no discount (null) — worth noting for any revenue analysis
- Prices range from $4.99 to $499.99, median $62.99
This takes 2-3 seconds. You already know more about this dataset than most analysts would after 5 minutes of scrolling through the spreadsheet.
Second 10-15: Question 1 — "What's our revenue by region?"
You type: "What's the total revenue by region?"
The AI sees your schema (column names and types) and generates:
SELECT
region,
ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
COUNT(DISTINCT order_id) AS order_count
FROM ecommerce_sales
GROUP BY region
ORDER BY net_revenue DESC;
Note: the AI correctly handles the discount and null values in discount_pct — this is where schema-only AI with profiling metadata earns its keep.
| region | net_revenue | order_count |
|---|---|---|
| West | 284,112.45 | 2,847 |
| East | 241,890.33 | 2,412 |
| Central | 178,445.21 | 1,789 |
| South | 156,220.18 | 1,563 |
| North | 121,887.90 | 1,389 |
West leads by a significant margin — both in revenue and order volume.
Second 15-18: Question 2 — "Which segment drives the most revenue?"
You type: "Break down revenue by customer segment."
SELECT
customer_segment,
ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
COUNT(DISTINCT customer_id) AS unique_customers,
ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))) /
COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM ecommerce_sales
GROUP BY customer_segment
ORDER BY net_revenue DESC;
| customer_segment | net_revenue | unique_customers | revenue_per_customer |
|---|---|---|---|
| Enterprise | 412,340.22 | 312 | 1,321.60 |
| Mid-Market | 234,112.05 | 487 | 480.72 |
| SMB | 198,220.80 | 1,245 | 159.21 |
| Consumer | 137,883.00 | 2,891 | 47.69 |
Enterprise customers are 6.8% of the customer base but generate 42% of revenue. Revenue per customer drops by 27x from Enterprise to Consumer.
Second 18-22: Question 3 — "What's the trend over time?"
You type: "Show me monthly revenue trend."
SELECT
DATE_TRUNC('month', order_date) AS month,
ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue,
COUNT(DISTINCT order_id) AS orders
FROM ecommerce_sales
GROUP BY month
ORDER BY month;
| month | net_revenue | orders |
|---|---|---|
| 2025-01 | 72,445.12 | 724 |
| 2025-02 | 68,112.33 | 681 |
| 2025-03 | 74,890.45 | 749 |
| ... | ... | ... |
| 2025-10 | 89,220.18 | 892 |
| 2025-11 | 102,445.67 | 1,024 |
| 2025-12 | 118,890.33 | 1,189 |
Revenue trending up in Q4, with December being the strongest month. Typical e-commerce seasonality. But wait —
Second 22-26: Question 4 — The drill-down
You notice the Q4 spike and ask: "Is the Q4 revenue growth coming from more orders or higher order values?"
SELECT
CASE
WHEN order_date >= '2025-10-01' THEN 'Q4'
ELSE 'Q1-Q3'
END AS period,
COUNT(DISTINCT order_id) AS total_orders,
ROUND(AVG(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS avg_order_value,
ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount
FROM ecommerce_sales
GROUP BY period;
| period | total_orders | avg_order_value | avg_discount |
|---|---|---|---|
| Q1-Q3 | 6,895 | 91.22 | 0.082 |
| Q4 | 3,105 | 112.45 | 0.142 |
Interesting. Q4 has higher average order values, but also nearly double the discount rate. The revenue growth is driven by volume and higher-value products, but the margins might be thinner.
Second 26-30: Question 5 — The non-obvious insight
This prompts a follow-up: "Which product categories are being discounted most heavily in Q4?"
SELECT
product_category,
COUNT(*) AS q4_orders,
ROUND(AVG(discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 3) AS avg_discount,
ROUND(SUM(quantity * unit_price * discount_pct) FILTER (WHERE discount_pct IS NOT NULL), 2) AS total_discount_given,
ROUND(SUM(quantity * unit_price * (1 - COALESCE(discount_pct, 0))), 2) AS net_revenue
FROM ecommerce_sales
WHERE order_date >= '2025-10-01'
GROUP BY product_category
ORDER BY avg_discount DESC;
| product_category | q4_orders | avg_discount | total_discount_given | net_revenue |
|---|---|---|---|---|
| Electronics | 487 | 0.198 | 18,445.22 | 74,220.33 |
| Furniture | 312 | 0.172 | 9,112.45 | 43,890.12 |
| Clothing | 534 | 0.156 | 7,890.33 | 42,667.80 |
| Office Supplies | 423 | 0.112 | 4,220.18 | 33,445.67 |
| ... | ... | ... | ... | ... |
The insight: Electronics is driving Q4 revenue with the highest order volume, but it's also being discounted at nearly 20% — double the company average. The $18K in discounts given on Electronics alone is significant. If this is intentional (holiday promotions), fine. If it's sales reps giving discretionary discounts to close deals, it's a margin problem worth investigating.
That's the kind of non-obvious finding that takes 30 minutes of pivot-table wrestling in Excel. Here, it took 5 questions in plain English.
What just happened
Let's recap the 30-second flow:
- Load — File goes from disk to browser memory. No upload.
- Profile — Auto-generated stats give you a map of the dataset before you ask anything.
- Explore — Natural language questions become SQL that runs locally. Each answer prompts the next question.
- Drill down — Follow the thread from "what" to "why." The AI chains queries that build on previous context.
- Find the insight — A non-obvious margin issue hiding behind strong top-line revenue growth.
The AI generated 5 SQL queries. DuckDB executed them locally. The AI never saw that Electronics had a 19.8% discount rate or that the West region generated $284K. It just wrote the SQL. Your machine answered it.
Try it yourself
The dataset above is fictional, but the workflow is real. You can replicate this with any CSV you have:
- Open a browser-based DuckDB-WASM tool
- Drag in your file
- Read the auto-profile
- Ask your first question
- Follow the thread
The best insights come from the third or fourth follow-up question — the ones you wouldn't have thought to ask before seeing the first results.
QueryVeil runs this exact workflow: DuckDB-WASM in the browser, auto-profiling, natural language to SQL, multi-step drill-downs. The live demo has sample data loaded if you want to try it without your own file. No signup required.