Products Selling Above the Average — Query Performance in SQL

The problem

Scenario: Brightlane's inventory analytics team is identifying products whose total units sold exceed the average total units sold across the catalog.

Task: Write a query to return each qualifying product_id and its total units sold.

Assumptions:

The order_items table holds one row per line item on an order, with product_id linking back to products.id and quantity recording the unit count.
A product's total_units is the combined quantity across all of its line items.
The result covers only products whose total_units is strictly greater than the average total_units across every product that has at least one line item on record.

Output:

One row per qualifying product.
Columns in this order: product_id, total_units.

Schema · ecommerce 5 tables

categories

id integer

name text

parent_id? integer

products

id integer

name text

category_id integer

price numeric

stock_qty integer

attributes? jsonb

order_items

id integer

order_id integer

product_id integer

quantity integer

unit_price numeric

customers

id integer

name text

email text

city? text

country text

created_at timestamptz

is_active boolean

orders

id integer

customer_id integer

ordered_at timestamptz

status text

total_amount numeric

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  product_sales AS (
    SELECT
      product_id,
      SUM(quantity) AS total_units
    FROM
      order_items
    GROUP BY
      product_id
  )
SELECT
  ps.product_id,
  ps.total_units
FROM
  product_sales ps
WHERE
  ps.total_units > (
    SELECT
      AVG(total_units)
    FROM
      product_sales
  )

The shape

The same per-product total has to appear twice — once as the value on each output row, and once aggregated again into the catalog-wide average that the threshold compares against. A CTE that computes SUM(quantity) per product once and gets referenced twice in the main statement is the shape that expresses this cleanly without recomputing the totals.

Clause by clause

The CTE product_sales reads order_items, groups by product_id, and computes SUM(quantity) AS total_units. One row per product that has at least one line item, with the per-product total already in place.
SELECT ps.product_id, ps.total_units FROM product_sales ps reads the CTE as the outer driving set. Every row already carries its per-product total.
WHERE ps.total_units > (SELECT AVG(total_units) FROM product_sales) is a scalar subquery that reads the same CTE a second time, averages the per-product totals across every product, and returns a single number. The outer WHERE compares each row's total_units against that single average and keeps only the rows that exceed it.

Why a CTE and not two derived tables

Without the CTE, the per-product aggregation has to be written twice — once for the outer row source and once inside the scalar subquery that computes the average. Two copies of the same aggregation drift over time and read as noise. A CTE names the aggregation once, and both references read from the same named set, which is the structural value the problem is practicing.

The trap

The threshold here is the average across products, not the average across line items. Writing WHERE total_units > (SELECT AVG(quantity) FROM order_items) is a different number entirely — it averages every line item's quantity, ignoring how line items are distributed across products. A product with one line item of quantity 10 and a product with ten line items of quantity 1 contribute equally to the per-product average (each contributes one row of value 10 and one row of value 10 respectively), but they contribute eleven rows of very different shapes to the per-line-item average. The prompt asks for the per-product average; the CTE makes that choice explicit by aggregating to one row per product first, and the scalar subquery averages over that already-collapsed set.

You practiced computing per-product totals once in a CTE, then referencing that same set twice — for the per-product value and for the all-product average — a shape only a CTE expresses cleanly.

Return each qualifying `product_id` and its total units sold

The shape

Clause by clause

Why a CTE and not two derived tables

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.