Brightlane's product team wants to find categories whose average product price is above the cross-category average — that is, whose category-level average sits above the average of every category's average price.
Write a query to return the category ID and average price for every category whose average exceeds the cross-category average.
Assumptions:
- A category's average price is the average of every
pricevalue linked to thatcategory_id. - The cross-category average is the average of those category-level averages, computed across every category.
- Only categories whose average price exceeds the cross-category average should appear.
Output:
- One row per qualifying category, with columns
category_idandavg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
category_stats AS (
SELECT
category_id,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
),
above_avg AS (
SELECT
category_id,
avg_price
FROM
category_stats
WHERE
avg_price > (
SELECT
AVG(avg_price)
FROM
category_stats
)
)
SELECT
category_id,
avg_price
FROM
above_avg The shape
The second CTE reads category_stats twice in the same query. Once row-by-row to source each category's average, and once through a scalar subquery (SELECT AVG(avg_price) FROM category_stats) that reduces the layer to a single number for comparison. Naming the per-category averages in a CTE is what makes the same set available on both sides of the threshold check.
Clause by clause
The first CTE produces one row per category with its average price:
WITH category_stats AS (
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id
)GROUP BY category_id produces one row per category, and AVG(price) is computed inside each group.
The second CTE compares each row's avg_price to the cross-category average:
above_avg AS (
SELECT category_id, avg_price
FROM category_stats
WHERE avg_price > (SELECT AVG(avg_price) FROM category_stats)
)FROM category_stats reads the per-category rows. The scalar subquery (SELECT AVG(avg_price) FROM category_stats) reads the same CTE again and aggregates it down to a single number, then WHERE avg_price > ... compares each row's value to that single number. The subquery is evaluated once for the whole query and that one result is reused on every row.
SELECT category_id, avg_price FROM above_avgreturns the three categories whose average price exceeds the cross-category average.
Why a CTE referenced twice and not the raw products table
The comparison value is the average of the per-category averages, not the average of every product price. Writing WHERE avg_price > (SELECT AVG(price) FROM products) would compute a different number, weighted by how many products each category has. Computing the per-category averages first and then averaging those averages is what the prompt actually asks for, and putting them in a named CTE means the two references to that intermediate set agree by definition. The same set is on both sides of the comparison.
You practiced layering two WITH stages where the second references the first twice — once row-by-row, and once through a scalar subquery that aggregates the layer to a single comparison value.