Brightlane's category-management team wants to see each product alongside its price and the average price across all products in the same category (not the catalogue-wide average).
Write a query to return the product name, price, and that product's own-category average price for every product.
Assumptions:
- The
productstable contains every product in the catalogue. - The third column is the average for the category that this row's product belongs to — different rows from different categories will see different averages.
- A product's own-category average includes that product itself in the calculation.
Output:
- One row per product, with columns
name,price, andcategory_avg.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
p.name,
p.price,
(
SELECT
AVG(p2.price)
FROM
products p2
WHERE
p2.category_id = p.category_id
) AS category_avg
FROM
products p The shape
The inner query references the outer row's category_id. PostgreSQL re-runs the subquery once per outer row, substituting that row's category_id each time, so the third column is a different average for each category. Apex Titan 15 sees its electronics category average (782.33); Crest Pro 14" sees its computers category average (1459). One query, sixty-three different category-specific averages.
Clause by clause
SELECT p.name, p.pricereads the current product's name and price. The aliaspis what lets the subquery refer back to this row.- The scalar subquery in the
SELECTlist:
(SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p.category_id) AS category_avgThis is a correlated subquery — p.category_id is a value from the outer query, used inside the inner one. The inner FROM products p2 aliases the same table separately so the two references don't collide. For each outer row, PostgreSQL plugs in that row's category_id, computes the average price across all products sharing it, and returns the single resulting number.
- FROM products p is the outer source — every product, processed one row at a time. For every product in this table, the subquery runs once with that product's category_id filled in.
The trap
Three rows in the result have category_avg as NULL: the two gift cards and the mystery bundle. These are products with no category_id on file — the column is NULL for those rows. The subquery's filter becomes WHERE p2.category_id = NULL, which never matches anything in SQL: NULL = NULL is not true, it's unknown. The inner query returns zero rows, and a scalar subquery that returns zero rows resolves to NULL.
No error fires. The query runs cleanly and silently delivers a NULL in the third column for any outer row whose correlated key was itself NULL. The rule: a correlated subquery on a nullable key produces NULL for any outer row missing that key, not a zero or an error. If those rows need a different default — 0, the catalogue-wide average, an explicit "uncategorised" bucket — the query has to handle the missing key directly.
You practiced a correlated scalar subquery — one whose result depends on the outer row. The recurring shape: when each row needs a per-group statistic of its own, the inner query references the outer row and PostgreSQL re-evaluates it once per outer row.