Scenario: Brightlane's data analyst ran EXPLAIN on a status report and saw the planner had assumed roughly even distribution across orders.status values, which led it to choose a sequential scan for a specific-status restriction. The analyst suspects one status dominates the order base and that the even-distribution assumption is what the statistics are missing.
Task: Write a query to return each status value and the number of orders recorded with that status, so the analyst can see how skewed the distribution actually is.
Assumptions:
- The
orderstable holds one row per placed order, with the order's outcome stored instatus. - One row in the result covers every order sharing the same
statusvalue.
Output:
- One row per
statusvalue present in the data. - Columns in this order:
status,order_count.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
status,
COUNT(*) AS order_count
FROM
orders
GROUP BY
status The shape
The planner assumed roughly even distribution across the four status values, which is exactly what GROUP BY status plus COUNT(*) will refute when one status dominates. The grouped count shows delivered at 161 against cancelled and pending at 11 each — the kind of skew that breaks the uniform-distribution assumption in the planner's statistics.
Clause by clause
SELECT status, COUNT(*) AS order_countreturns the status value and the row count behind it, so each group's contribution to the total is visible alongside its label.FROM ordersreads the order records.GROUP BY statuspartitions the rows by status value, soCOUNT(*)runs once per group instead of once over the whole table. Each output row is one status bucket.
Why this and not a WHERE per status
The analyst could run WHERE status = 'delivered' and then WHERE status = 'cancelled' and so on, four queries, four counts. The grouped version returns the same information in one pass — and crucially, in the same shape as the planner's per-group estimate. When EXPLAIN ANALYZE annotates a group-aggregate node with actual rows=4 (four groups out) and an estimate of rows=4, the matching shape is what makes the gap at the per-group row counts visible. The planner publishes one estimate per group; this query publishes one actual per group; the comparison is direct.
The trap
GROUP BY produces one row per distinct group present in the data. If a status value exists in the schema but has zero matching orders, it won't appear in the result — and won't appear in the planner's estimate either, because the planner is working from the same statistics. A "missing" status in the output is not a bug in the count; it's a real signal that the status has no rows. Don't read absence as zero; read it as absent.
You practiced computing the actual distribution behind a planner assumption — a per-status count reveals the skew that uniform-distribution statistics would hide.