Brightlane's catalog team wants to find premium, well-stocked categories — categories that carry 3 or more products and whose average product price exceeds the average across every well-stocked category.
Write a query to return the category ID and average price for every premium, well-stocked category.
Assumptions:
- A category's product count is the number of
productsrecords linked to thatcategory_id. A category's average price is the average ofpriceacross those records. - A well-stocked category has
3or more products. The well-stocked group consists of every well-stocked category. - The well-stocked-group average is the average of the per-category averages across every well-stocked category.
- Only well-stocked categories whose average price exceeds the well-stocked-group average should appear.
Output:
- One row per qualifying category, with columns
category_idandavg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
cat_stats AS (
SELECT
category_id,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
),
large_cats AS (
SELECT
category_id,
product_count,
avg_price
FROM
cat_stats
WHERE
product_count >= 3
),
premium_large AS (
SELECT
category_id,
avg_price
FROM
large_cats
WHERE
avg_price > (
SELECT
AVG(avg_price)
FROM
large_cats
)
)
SELECT
category_id,
avg_price
FROM
premium_large The shape
Three cascading layers. The first computes both per-category statistics, the second restricts to well-stocked categories, and the third compares each well-stocked category's average price against the average of those averages, computed over the restricted set. The third layer reads its source twice — row-by-row and through a scalar subquery — so the comparison value is the well-stocked-group average, not the cross-catalog average.
Clause by clause
The first CTE produces one statistics row per category:
WITH cat_stats AS (
SELECT category_id, COUNT(*) AS product_count, AVG(price) AS avg_price
FROM products
GROUP BY category_id
)GROUP BY category_id produces one row per category. COUNT(*) and AVG(price) are both computed in the same pass, since both are needed downstream: the count qualifies the category and the average is what the final comparison uses.
The second CTE keeps the well-stocked categories:
large_cats AS (
SELECT category_id, product_count, avg_price
FROM cat_stats
WHERE product_count >= 3
)WHERE product_count >= 3 drops the sparse categories. The SELECT list carries avg_price forward unchanged because the next layer needs it on both sides of the comparison.
The third CTE compares each remaining row's avg_price to the well-stocked-group average:
premium_large AS (
SELECT category_id, avg_price
FROM large_cats
WHERE avg_price > (SELECT AVG(avg_price) FROM large_cats)
)The scalar subquery (SELECT AVG(avg_price) FROM large_cats) reads the restricted layer and reduces it to a single number. WHERE avg_price > ... then compares each well-stocked category's average to that number. Same set on both sides of the comparison; both references see only well-stocked categories.
SELECT category_id, avg_price FROM premium_largereturns the three premium well-stocked categories:5,6, and7.
Why the scalar subquery reads large_cats and not cat_stats
The well-stocked-group average is the average of the well-stocked categories' averages, not the average of every category's average. Writing (SELECT AVG(avg_price) FROM cat_stats) would pull the sparse categories back into the comparison value, and a high-priced sparse category would shift the threshold. Reading from large_cats on both sides keeps the comparison anchored to the restricted set the prompt actually describes.
The trap
The restriction "well-stocked" lives in the second CTE, and it has to be respected by both references in the third. The row-by-row read FROM large_cats makes that automatic on the left side of the comparison, but the scalar subquery has to spell it out on the right by also reading from large_cats. Letting the scalar subquery slip back to cat_stats silently changes the threshold the surviving rows are measured against, and the change is invisible in the output until a borderline category flips one way or the other.
You practiced cascading three WITH stages where the second applies a count threshold and the third compares each remaining row's value against an aggregate of that same restricted set.