Average and Max Items per Order — Debugging Queries in SQL

The problem

Scenario: Brightlane's analyst ran EXPLAIN ANALYZE on a per-order revenue query and saw the planner estimating 200 rows for the pairing of orders with order_items, while the run produced 637 actual rows. The large gap suggests stale statistics. To quantify the fanout precisely and inform whether to refresh statistics, the analyst needs the average and maximum line-item count per order.

Task: Write a query to return two metrics: avg_items_per_order (the average number of order_items per order) and max_items_per_order (the largest number of order_items attached to any single order).

Assumptions:

A per-order item count is the count of order_items recorded against an order.
The avg_items_per_order is the average per-order item count across every order with at least one line item.
The max_items_per_order is the largest per-order item count across the same set.

Output:

One row, holding the two metrics.
Columns in this order: avg_items_per_order, max_items_per_order.

Schema · ecommerce 5 tables

The shape

A CTE that counts line items per order, then AVG and MAX over that one-row-per-order summary. The two metrics together describe the fanout's distribution — the average reports the typical multiplier, the max reports the longest tail. A planner row-estimate gap of 200 estimated vs. 637 actual is the symptom; this query produces the two numbers that interpret it.

Clause by clause

WITH order_item_counts AS (SELECT order_id, COUNT(*) AS item_count FROM order_items GROUP BY order_id) collapses order_items to one row per order with its line-item count. The CTE is the per-parent-fanout distribution in tabular form — every row is one order, every value is that order's multiplier.
SELECT AVG(item_count) AS avg_items_per_order, MAX(item_count) AS max_items_per_order aggregates across that distribution without a GROUP BY, collapsing it to one row. AVG returns the typical per-order multiplier; MAX returns the worst-case multiplier. The reference values are 1.098 and 3 — a typical fanout barely above 1, with the longest tail at three items.
FROM order_item_counts reads the CTE. There's no WHERE, no further filtering — the diagnostic wants the shape of the full distribution, not a sliced subset.

Why this and not `SUM(item_count) / COUNT(*)`

The two forms return the same number on this data, but AVG names the intent. A reader scanning the query sees "average items per order" and the meaning lands without parsing the arithmetic. More importantly, AVG skips NULL rows, which matters if the CTE were later changed to include orders with no line items via a left join. SUM / COUNT(*) would silently divide by the wrong denominator under that change; AVG would adjust correctly. Reach for the named aggregate when one exists.

The trap

The 200-vs-637 gap can be interpreted two ways, and the two metrics together are what disambiguate them. If the average were close to 3.2 (matching 637 / 200), the gap would mean the planner has consistently stale statistics on a uniformly-fanning relationship — ANALYZE on the table fixes it. But the actual average is 1.098, which means most orders barely fan out at all; the gap comes from a small number of high-item-count outliers (the orders with item_count = 3) dominating the actual row count. That's a long-tail distribution, not a uniform one, and the structural fix is different — pre-aggregating before the join is more likely to help than refreshing statistics. The average alone wouldn't have surfaced this; the max alone wouldn't have either. The diagnostic is the pair.

You practiced characterizing fanout shape — average and peak per-parent child counts — so a planner's row-estimate gap can be interpreted as either typical fanout or a long-tail outlier driving the cost.

Return two metrics: `avg_items_per_order` (the average number of `order_items` per order) and `max_items_per_order` (the largest number of `order_items` attached to any single order)

The shape

Clause by clause

Why this and not `SUM(item_count) / COUNT(*)`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not SUM(item_count) / COUNT(*)

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not `SUM(item_count) / COUNT(*)`