Brightlane's analytics team wants to identify customers whose personal average order value exceeds the mean of all customers' individual averages — outperformers compared to the typical customer, not just the overall order-amount average.
Write a query to return the customer ID and personal average order value for every qualifying customer.
Assumptions:
- A customer's personal average is the
AVG(total_amount)over their own orders. - The threshold is the average of customers' personal averages — not
AVG(total_amount)over the wholeorderstable. (A customer with one $1,000 order and a customer with twenty $500 orders contribute equally to this mean — one personal average each — even though their order counts differ wildly.) - A qualifying customer's personal average exceeds that across-customers mean.
Output:
- One row per qualifying customer, with columns
customer_idandavg_order_value.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
customer_id,
avg_order_value
FROM
(
SELECT
customer_id,
AVG(total_amount) AS avg_order_value
FROM
orders
GROUP BY
customer_id
) AS customer_avgs
WHERE
avg_order_value > (
SELECT
AVG(cust_avg)
FROM
(
SELECT
AVG(total_amount) AS cust_avg
FROM
orders
GROUP BY
customer_id
) AS all_cust_avgs
) The shape
The outer derived table holds each customer's personal average. The threshold on the right side of WHERE is the average of those per-customer averages — which requires its own derived table, nested inside a scalar subquery. Two passes for the answer set, two passes for the threshold, and the layering keeps the two definitions of average from collapsing into one another.
Clause by clause
- The outer derived table computes one row per customer with their personal average:
SELECT customer_id, AVG(total_amount) AS avg_order_value
FROM orders
GROUP BY customer_idMaterialised as customer_avgs — the row set the outer SELECT reads from.
- The threshold is computed in a scalar subquery on the right of WHERE:
SELECT AVG(cust_avg) FROM (
SELECT AVG(total_amount) AS cust_avg
FROM orders
GROUP BY customer_id
) AS all_cust_avgsThe inner block produces the same per-customer averages again, this time as all_cust_avgs, and the surrounding AVG(cust_avg) collapses them to a single number — the mean of the personal averages.
- WHERE avg_order_value > (...) then compares each customer's personal average against that single threshold value. Twenty-nine customers clear the bar.
- SELECT customer_id, avg_order_value returns each qualifying customer with their personal average. Customer 24 tops the list at 1274, customer 10 next at 999.
Why this and not AVG(total_amount) over the whole orders table
A customer with one $1,000 order contributes one personal average of 1000 to the across-customers mean. A customer with twenty $500 orders contributes one personal average of 500. The across-customers mean weights each customer equally regardless of volume. AVG(total_amount) over the whole orders table weights each order equally, so the high-volume customer dominates. The prompt's language about outperforming the typical customer locks in the first definition.
The trap
The trap is using (SELECT AVG(total_amount) FROM orders) as the threshold and assuming it's the same number. It isn't. That scalar subquery computes a different statistic — the mean order amount across all orders, with high-volume customers carrying disproportionate weight. A customer's personal average can sit above one threshold and below the other, so swapping the two definitions silently changes who qualifies.
The rule: when the threshold is the average of a per-group statistic, the threshold needs its own group-and-aggregate pass first, and a derived table is the scaffold that makes that pass nameable.
You practiced nested derived tables: a derived table for the per-customer averages, with a scalar-subquery threshold whose own subquery is another derived table for the average-of-averages. The recurring shape any time the answer requires aggregating an aggregate — the layering makes each pass over the data explicit.