Item Count per Category — Reading EXPLAIN Output in SQL

The problem

Scenario: Brightlane's data analyst ran EXPLAIN on a query that pulls order line items together with their products to compute per-category item counts, and saw the planner estimating 20 total rows at the nested-loop step. Actual execution processed over 100 rows, driving cost much higher than expected. The analyst suspects certain product categories contribute disproportionately.

Task: Write a query to return each category_id and the total number of line items associated with products in that category, so the analyst can see the actual per-category contribution.

Assumptions:

One row in the result covers every line item whose product shares the same category_id.

Output:

One row per category_id present in the data with at least one line item.
Columns in this order: category_id, item_count.

Schema · ecommerce 5 tables

The shape

The planner estimated 20 rows out of the nested-loop step; actual execution processed over 100, and the per-category breakdown shows why — category 1 alone holds 25 line items, category 5 holds 19, category 6 holds 13. The skew across categories is what the planner was missing. The query joins order_items to products, groups by category_id, and counts.

Clause by clause

SELECT p.category_id, COUNT(oi.id) AS item_count returns the category and the number of line items associated with its products. COUNT(oi.id) counts each joined line-item row once.
FROM order_items oi reads the line-item records — the larger of the two tables, and the side the nested-loop's outer plan would scan if the planner believes it produces few rows.
JOIN products p ON oi.product_id = p.id pairs each line item with its product, bringing category_id into reach. The inner join discards any line item whose product has been removed from the catalog.
GROUP BY p.category_id partitions the joined rows by category, so the count runs once per category.

Why this and not just counting `order_items` directly

A bare COUNT(*) FROM order_items returns the total line-item count without telling the analyst which categories drive it. The planner's 20-row estimate at the nested-loop step is shaped by its model of the join's cardinality, and that model is built from the joint distribution of order_items.product_id and products.category_id. Breaking the count down per category exposes which categories the planner is mismodeling — category 1 with 25 items is doing the disproportionate work. Without the per-category split, the analyst only knows the total is wrong; with it, they know which category's statistic to fix.

The trap

A nested-loop join's cost scales linearly with the row count of its outer side. When the planner estimates 20 rows on a node that actually produces 100+, it picked nested-loop because it expected the inner-side lookups to be cheap and few. At 100+ outer rows the same plan becomes a hash join's natural territory, but the planner is locked into its choice once execution starts. The fix isn't visible in the query — it's in the statistics that produced the underestimate, which is what running ANALYZE on the underlying tables addresses. Reading the actual per-category counts is what tells the analyst whether the issue is one fat category (a histogram problem) or many small ones the planner under-counted across the board (a row-count problem).

You practiced verifying an estimated nested-loop output size against actual row counts — the gap reveals which join cardinality the planner is mismodeling.

Return each `category_id` and the total number of line items associated with `products` in that category, so the analyst can see the actual per-category contribution

The shape

Clause by clause

Why this and not just counting `order_items` directly

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not just counting order_items directly

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not just counting `order_items` directly