Scenario: Brightlane's merchandising team is benchmarking each product against its category's average price.
Task: Write a query to return each product_id and the average price of all products in that same category.
Assumptions:
- The
productstable holds one row per product, withididentifying it,category_idrecording its category, andpricestoring its current price. - The
category_avg_pricefor a product is the averagepriceacross every product sharing the samecategory_id. - The result covers every product in the catalog.
Output:
- One row per product.
- Columns in this order:
product_id,category_avg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
p.id AS product_id,
cat_avg.avg_price AS category_avg_price
FROM
products p
JOIN (
SELECT
category_id,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
) AS cat_avg ON cat_avg.category_id = p.category_id The shape
Every product's row needs a value computed across the whole category that product belongs to. A derived table computes the per-category average once per category, then the main query joins each product to its category's row to attach the benchmark.
Clause by clause
SELECT p.id AS product_id, cat_avg.avg_price AS category_avg_pricereturns the product's identifier and the per-category average pulled from the derived table.FROM products pis the parent set — every catalog product appears once.- The derived table:
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_idOne row per category, with the category's average price already computed.
- AS cat_avg ON cat_avg.category_id = p.category_id joins each product to its category's row. Every product has a category, and every category that contains products appears in the derived table, so the inner join loses nothing.
Why this and not a correlated subquery per row
The same numbers come out if the average is written as a correlated subquery in the SELECT list: SELECT p.id, (SELECT AVG(price) FROM products p2 WHERE p2.category_id = p.category_id) AS category_avg_price FROM products p. The result is identical. The cost is not: the correlated form re-runs the subquery once per outer row, computing the same category average over and over. The derived-table form computes each category's average exactly once and broadcasts it to every product in that category through the join. On a large catalog this is the structural difference between a quadratic and a linear pass over the data.
The trap
AVG(price) averages over the rows actually in the group. Products with a null price would be excluded from the average automatically — AVG ignores nulls. That is the standard behavior, and it is the right one here. The trap is assuming AVG divides by every row including nulls; it does not, so the average reflects only the priced rows.
You practiced precomputing per-category averages in a derived table before reattaching to each product — separating the per-category calculation from the per-product output.