Scenario: Brightlane's pricing team is auditing products that are priced above the average for their own category.
Task: Write a query to return each qualifying product_id and its price.
Assumptions:
- The
productstable holds one row per product, withididentifying it,category_idrecording its category, andpricestoring its current price. - The category average price for a product is the average
priceacross every product sharing the samecategory_id. - The result covers only products whose
priceis strictly greater than the category average price for their own category.
Output:
- One row per qualifying product.
- Columns in this order:
product_id,price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
p.id AS product_id,
p.price
FROM
products p
JOIN (
SELECT
category_id,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
) AS cat_avg ON cat_avg.category_id = p.category_id
WHERE
p.price > cat_avg.avg_price The shape
Every product has to be compared against a value that itself depends on the whole category. A derived table computes the per-category average once per category, the main query joins each product to its category's row, and the WHERE then compares the product's price against the category benchmark sitting on the same row.
Clause by clause
- The derived table:
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_idOne row per category, with the category's average price already computed across every product in that category.
- SELECT p.id AS product_id, p.price returns the product's identifier and its own price — the category average is used for the comparison but is not in the output.
- FROM products p JOIN (...) AS cat_avg ON cat_avg.category_id = p.category_id attaches each product to its category's benchmark row. Every product belongs to exactly one category, so the join produces exactly one row per product.
- WHERE p.price > cat_avg.avg_price keeps only the products whose price strictly exceeds their category's average.
Why this and not a correlated subquery in WHERE
The same logic can be written as WHERE p.price > (SELECT AVG(price) FROM products p2 WHERE p2.category_id = p.category_id). The result is identical. The cost is not: the correlated form re-runs the subquery once per outer row, recomputing the same per-category average for every product in that category. The derived-table form computes each category's average exactly once and the join broadcasts it. On a large catalog that is the difference between scanning the table once per category and scanning it once per product.
The trap
The category average includes the product being compared against it. A product priced at the category mean is not above its category average and is correctly excluded by the strict >. But the average itself is pulled toward the product's own price by including it — on a category with only two products, each product's price contributes half the average, so the cheaper product can only be "below" by exactly the gap between the two prices. The benchmark is not "the average of the other products in the category"; it is "the average of every product in the category, including this one." The prompt asks for the latter, and that is what this query computes. If the analyst wanted the former, the derived table would need to exclude each product from its own group, which is a different shape entirely.
You practiced precomputing per-category averages in a derived table, then reattaching each product to its category benchmark for comparison — a shape that runs the average computation once per category rather than once per product.