Scenario: Brightlane's inventory analytics team is identifying products whose total units sold exceed the average total units sold across the catalog.
Task: Write a query to return each qualifying product_id and its total units sold.
Assumptions:
- The
order_itemstable holds one row per line item on an order, withproduct_idlinking back toproducts.idandquantityrecording the unit count. - A product's
total_unitsis the combinedquantityacross all of its line items. - The result covers only products whose
total_unitsis strictly greater than the averagetotal_unitsacross every product that has at least one line item on record.
Output:
- One row per qualifying product.
- Columns in this order:
product_id,total_units.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
product_sales AS (
SELECT
product_id,
SUM(quantity) AS total_units
FROM
order_items
GROUP BY
product_id
)
SELECT
ps.product_id,
ps.total_units
FROM
product_sales ps
WHERE
ps.total_units > (
SELECT
AVG(total_units)
FROM
product_sales
) The shape
The same per-product total has to appear twice — once as the value on each output row, and once aggregated again into the catalog-wide average that the threshold compares against. A CTE that computes SUM(quantity) per product once and gets referenced twice in the main statement is the shape that expresses this cleanly without recomputing the totals.
Clause by clause
- The CTE
product_salesreadsorder_items, groups byproduct_id, and computesSUM(quantity) AS total_units. One row per product that has at least one line item, with the per-product total already in place. SELECT ps.product_id, ps.total_units FROM product_sales psreads the CTE as the outer driving set. Every row already carries its per-product total.WHERE ps.total_units > (SELECT AVG(total_units) FROM product_sales)is a scalar subquery that reads the same CTE a second time, averages the per-product totals across every product, and returns a single number. The outerWHEREcompares each row'stotal_unitsagainst that single average and keeps only the rows that exceed it.
Why a CTE and not two derived tables
Without the CTE, the per-product aggregation has to be written twice — once for the outer row source and once inside the scalar subquery that computes the average. Two copies of the same aggregation drift over time and read as noise. A CTE names the aggregation once, and both references read from the same named set, which is the structural value the problem is practicing.
The trap
The threshold here is the average across products, not the average across line items. Writing WHERE total_units > (SELECT AVG(quantity) FROM order_items) is a different number entirely — it averages every line item's quantity, ignoring how line items are distributed across products. A product with one line item of quantity 10 and a product with ten line items of quantity 1 contribute equally to the per-product average (each contributes one row of value 10 and one row of value 10 respectively), but they contribute eleven rows of very different shapes to the per-line-item average. The prompt asks for the per-product average; the CTE makes that choice explicit by aggregating to one row per product first, and the scalar subquery averages over that already-collapsed set.
You practiced computing per-product totals once in a CTE, then referencing that same set twice — for the per-product value and for the all-product average — a shape only a CTE expresses cleanly.