Order Counts Grouped by Status — Reading EXPLAIN Output in SQL

The problem

Scenario: Brightlane's data analyst ran EXPLAIN on a status report and saw the planner had assumed roughly even distribution across orders.status values, which led it to choose a sequential scan for a specific-status restriction. The analyst suspects one status dominates the order base and that the even-distribution assumption is what the statistics are missing.

Task: Write a query to return each status value and the number of orders recorded with that status, so the analyst can see how skewed the distribution actually is.

Assumptions:

The orders table holds one row per placed order, with the order's outcome stored in status.
One row in the result covers every order sharing the same status value.

Output:

One row per status value present in the data.
Columns in this order: status, order_count.

Schema · ecommerce 5 tables

The shape

The planner assumed roughly even distribution across the four status values, which is exactly what GROUP BY status plus COUNT(*) will refute when one status dominates. The grouped count shows delivered at 161 against cancelled and pending at 11 each — the kind of skew that breaks the uniform-distribution assumption in the planner's statistics.

Clause by clause

SELECT status, COUNT(*) AS order_count returns the status value and the row count behind it, so each group's contribution to the total is visible alongside its label.
FROM orders reads the order records.
GROUP BY status partitions the rows by status value, so COUNT(*) runs once per group instead of once over the whole table. Each output row is one status bucket.

Why this and not a `WHERE` per status

The analyst could run WHERE status = 'delivered' and then WHERE status = 'cancelled' and so on, four queries, four counts. The grouped version returns the same information in one pass — and crucially, in the same shape as the planner's per-group estimate. When EXPLAIN ANALYZE annotates a group-aggregate node with actual rows=4 (four groups out) and an estimate of rows=4, the matching shape is what makes the gap at the per-group row counts visible. The planner publishes one estimate per group; this query publishes one actual per group; the comparison is direct.

The trap

GROUP BY produces one row per distinct group present in the data. If a status value exists in the schema but has zero matching orders, it won't appear in the result — and won't appear in the planner's estimate either, because the planner is working from the same statistics. A "missing" status in the output is not a bug in the count; it's a real signal that the status has no rows. Don't read absence as zero; read it as absent.

You practiced computing the actual distribution behind a planner assumption — a per-status count reveals the skew that uniform-distribution statistics would hide.

Return each `status` value and the number of `orders` recorded with that status, so the analyst can see how skewed the distribution actually is

The shape

Clause by clause

Why this and not a `WHERE` per status

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not a WHERE per status

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not a `WHERE` per status