Scenario: Brightlane's analyst ran EXPLAIN ANALYZE on a per-order revenue query and saw the planner estimating 200 rows for the pairing of orders with order_items, while the run produced 637 actual rows. The large gap suggests stale statistics. To quantify the fanout precisely and inform whether to refresh statistics, the analyst needs the average and maximum line-item count per order.
Task: Write a query to return two metrics: avg_items_per_order (the average number of order_items per order) and max_items_per_order (the largest number of order_items attached to any single order).
Assumptions:
- A per-order item count is the count of
order_itemsrecorded against an order. - The
avg_items_per_orderis the average per-order item count across every order with at least one line item. - The
max_items_per_orderis the largest per-order item count across the same set.
Output:
- One row, holding the two metrics.
- Columns in this order:
avg_items_per_order,max_items_per_order.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
order_item_counts AS (
SELECT
order_id,
COUNT(*) AS item_count
FROM
order_items
GROUP BY
order_id
)
SELECT
AVG(item_count) AS avg_items_per_order,
MAX(item_count) AS max_items_per_order
FROM
order_item_counts The shape
A CTE that counts line items per order, then AVG and MAX over that one-row-per-order summary. The two metrics together describe the fanout's distribution — the average reports the typical multiplier, the max reports the longest tail. A planner row-estimate gap of 200 estimated vs. 637 actual is the symptom; this query produces the two numbers that interpret it.
Clause by clause
WITH order_item_counts AS (SELECT order_id, COUNT(*) AS item_count FROM order_items GROUP BY order_id)collapsesorder_itemsto one row per order with its line-item count. The CTE is the per-parent-fanout distribution in tabular form — every row is one order, every value is that order's multiplier.SELECT AVG(item_count) AS avg_items_per_order, MAX(item_count) AS max_items_per_orderaggregates across that distribution without aGROUP BY, collapsing it to one row.AVGreturns the typical per-order multiplier;MAXreturns the worst-case multiplier. The reference values are 1.098 and 3 — a typical fanout barely above 1, with the longest tail at three items.FROM order_item_countsreads the CTE. There's noWHERE, no further filtering — the diagnostic wants the shape of the full distribution, not a sliced subset.
Why this and not SUM(item_count) / COUNT(*)
The two forms return the same number on this data, but AVG names the intent. A reader scanning the query sees "average items per order" and the meaning lands without parsing the arithmetic. More importantly, AVG skips NULL rows, which matters if the CTE were later changed to include orders with no line items via a left join. SUM / COUNT(*) would silently divide by the wrong denominator under that change; AVG would adjust correctly. Reach for the named aggregate when one exists.
The trap
The 200-vs-637 gap can be interpreted two ways, and the two metrics together are what disambiguate them. If the average were close to 3.2 (matching 637 / 200), the gap would mean the planner has consistently stale statistics on a uniformly-fanning relationship — ANALYZE on the table fixes it. But the actual average is 1.098, which means most orders barely fan out at all; the gap comes from a small number of high-item-count outliers (the orders with item_count = 3) dominating the actual row count. That's a long-tail distribution, not a uniform one, and the structural fix is different — pre-aggregating before the join is more likely to help than refreshing statistics. The average alone wouldn't have surfaced this; the max alone wouldn't have either. The diagnostic is the pair.
You practiced characterizing fanout shape — average and peak per-parent child counts — so a planner's row-estimate gap can be interpreted as either typical fanout or a long-tail outlier driving the cost.