N066-H3 Tier 5 · Expert · hard ecommerce · Brightlane

Return two metrics: `avg_items_per_order` (the average number of `order_items` per order) and `max_items_per_order` (the largest number of `order_items` attached to any single order)

Part of Analyst Debugging Patterns in SQL

The problem

Scenario: Brightlane's analyst ran EXPLAIN ANALYZE on a per-order revenue query and saw the planner estimating 200 rows for the pairing of orders with order_items, while the run produced 637 actual rows. The large gap suggests stale statistics. To quantify the fanout precisely and inform whether to refresh statistics, the analyst needs the average and maximum line-item count per order.

Task: Write a query to return two metrics: avg_items_per_order (the average number of order_items per order) and max_items_per_order (the largest number of order_items attached to any single order).

Assumptions:

  • A per-order item count is the count of order_items recorded against an order.
  • The avg_items_per_order is the average per-order item count across every order with at least one line item.
  • The max_items_per_order is the largest per-order item count across the same set.

Output:

  • One row, holding the two metrics.
  • Columns in this order: avg_items_per_order, max_items_per_order.
Schema · ecommerce 5 tables
categories
id integer
name text
parent_id? integer
products
id integer
name text
category_id integer
price numeric
stock_qty integer
attributes? jsonb
order_items
id integer
order_id integer
product_id integer
quantity integer
unit_price numeric
customers
id integer
name text
email text
city? text
country text
created_at timestamptz
is_active boolean
orders
id integer
customer_id integer
ordered_at timestamptz
status text
total_amount numeric

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
WITH
  order_item_counts AS (
    SELECT
      order_id,
      COUNT(*) AS item_count
    FROM
      order_items
    GROUP BY
      order_id
  )
SELECT
  AVG(item_count) AS avg_items_per_order,
  MAX(item_count) AS max_items_per_order
FROM
  order_item_counts

The shape

A CTE that counts line items per order, then AVG and MAX over that one-row-per-order summary. The two metrics together describe the fanout's distribution — the average reports the typical multiplier, the max reports the longest tail. A planner row-estimate gap of 200 estimated vs. 637 actual is the symptom; this query produces the two numbers that interpret it.

Clause by clause

  • WITH order_item_counts AS (SELECT order_id, COUNT(*) AS item_count FROM order_items GROUP BY order_id) collapses order_items to one row per order with its line-item count. The CTE is the per-parent-fanout distribution in tabular form — every row is one order, every value is that order's multiplier.
  • SELECT AVG(item_count) AS avg_items_per_order, MAX(item_count) AS max_items_per_order aggregates across that distribution without a GROUP BY, collapsing it to one row. AVG returns the typical per-order multiplier; MAX returns the worst-case multiplier. The reference values are 1.098 and 3 — a typical fanout barely above 1, with the longest tail at three items.
  • FROM order_item_counts reads the CTE. There's no WHERE, no further filtering — the diagnostic wants the shape of the full distribution, not a sliced subset.

Why this and not SUM(item_count) / COUNT(*)

The two forms return the same number on this data, but AVG names the intent. A reader scanning the query sees "average items per order" and the meaning lands without parsing the arithmetic. More importantly, AVG skips NULL rows, which matters if the CTE were later changed to include orders with no line items via a left join. SUM / COUNT(*) would silently divide by the wrong denominator under that change; AVG would adjust correctly. Reach for the named aggregate when one exists.

The trap

The 200-vs-637 gap can be interpreted two ways, and the two metrics together are what disambiguate them. If the average were close to 3.2 (matching 637 / 200), the gap would mean the planner has consistently stale statistics on a uniformly-fanning relationship — ANALYZE on the table fixes it. But the actual average is 1.098, which means most orders barely fan out at all; the gap comes from a small number of high-item-count outliers (the orders with item_count = 3) dominating the actual row count. That's a long-tail distribution, not a uniform one, and the structural fix is different — pre-aggregating before the join is more likely to help than refreshing statistics. The average alone wouldn't have surfaced this; the max alone wouldn't have either. The diagnostic is the pair.

You practiced characterizing fanout shape — average and peak per-parent child counts — so a planner's row-estimate gap can be interpreted as either typical fanout or a long-tail outlier driving the cost.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.