Products with Category Average Price — Scalar Subqueries in SQL

The problem

Brightlane's category-management team wants to see each product alongside its price and the average price across all products in the same category (not the catalogue-wide average).

Write a query to return the product name, price, and that product's own-category average price for every product.

Assumptions:

The products table contains every product in the catalogue.
The third column is the average for the category that this row's product belongs to — different rows from different categories will see different averages.
A product's own-category average includes that product itself in the calculation.

Output:

One row per product, with columns name, price, and category_avg.

Schema · ecommerce 5 tables

The shape

The inner query references the outer row's category_id. PostgreSQL re-runs the subquery once per outer row, substituting that row's category_id each time, so the third column is a different average for each category. Apex Titan 15 sees its electronics category average (782.33); Crest Pro 14" sees its computers category average (1459). One query, sixty-three different category-specific averages.

Clause by clause

SELECT p.name, p.price reads the current product's name and price. The alias p is what lets the subquery refer back to this row.
The scalar subquery in the SELECT list:

(SELECT AVG(p2.price)
 FROM products p2
 WHERE p2.category_id = p.category_id) AS category_avg

This is a correlated subquery — p.category_id is a value from the outer query, used inside the inner one. The inner FROM products p2 aliases the same table separately so the two references don't collide. For each outer row, PostgreSQL plugs in that row's category_id, computes the average price across all products sharing it, and returns the single resulting number. - FROM products p is the outer source — every product, processed one row at a time. For every product in this table, the subquery runs once with that product's category_id filled in.

The trap

Three rows in the result have category_avg as NULL: the two gift cards and the mystery bundle. These are products with no category_id on file — the column is NULL for those rows. The subquery's filter becomes WHERE p2.category_id = NULL, which never matches anything in SQL: NULL = NULL is not true, it's unknown. The inner query returns zero rows, and a scalar subquery that returns zero rows resolves to NULL.

No error fires. The query runs cleanly and silently delivers a NULL in the third column for any outer row whose correlated key was itself NULL. The rule: a correlated subquery on a nullable key produces NULL for any outer row missing that key, not a zero or an error. If those rows need a different default — 0, the catalogue-wide average, an explicit "uncategorised" bucket — the query has to handle the missing key directly.

You practiced a correlated scalar subquery — one whose result depends on the outer row. The recurring shape: when each row needs a per-group statistic of its own, the inner query references the outer row and PostgreSQL re-evaluates it once per outer row.

Return the product name, price, and that product's own-category average price for every product

The shape

Clause by clause

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.