Cheapest Product per Category — DISTINCT ON in SQL

The problem

Brightlane's catalog team needs the lowest-priced product for each category. Products with no category on record form their own group — the lowest-priced uncategorized product should also appear in the result.

Write a query to return one row per category group, showing the category ID, ID of the lowest-priced product in that group, the product name, and the price. Sort the final result by category_id ascending.

Assumptions:

The lowest-priced product in a category is the product with the smallest price for that category_id. When two products in the same category share the same price, the product with the smaller id wins.
Products with a missing category_id form their own group; the lowest-priced product among that group appears in the result with a missing category_id.
The final result is sorted by category_id ascending; the missing-category_id row appears at the end.

Output:

One row per category group (each category_id value plus the missing-category_id group), with columns category_id, product_id, name, and price. Sorted by category_id ascending; missing-category_id row last.

Schema · ecommerce 5 tables

The shape

A missing category_id is treated like any other distinct value by DISTINCT ON. DISTINCT ON (category_id) keeps one row per distinct category_id, and the missing value is one of those distinct values, so the missing-category products form their own group and get their own row. ORDER BY category_id, price, id picks the cheapest product in each group, with the smaller id winning ties.

Clause by clause

SELECT DISTINCT ON (category_id) category_id, id AS product_id, name, price returns the four columns the catalog review needs. DISTINCT ON (category_id) declares one row per distinct category_id value, including the missing value as its own distinct group.
FROM products reads the product records.
ORDER BY category_id, price, id sorts the products by three ascending keys. The leading category_id ascending satisfies the DISTINCT ON requirement and gives the final result its category-ordered shape; PostgreSQL puts missing values last in an ascending sort by default, which is why the missing-category row appears at the end. The second key, price ascending, makes the cheapest product sit first in each category's group. The third key, id ascending, breaks price ties: when two products in the same category share a price, the one with the smaller id sorts first and wins the per-category pick.

Why this and not `ROW_NUMBER`

The window-function form behaves the same way on missing values:

SELECT category_id, product_id, name, price
FROM (
  SELECT category_id, id AS product_id, name, price,
    ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price, id) AS rn
  FROM products
) ranked
WHERE rn = 1
ORDER BY category_id

PARTITION BY category_id puts all missing-category rows into the same partition, exactly the way DISTINCT ON (category_id) puts them in the same group. Both forms return the missing-category row as part of the result.

The trap

The instinct on missing values is that they get filtered out or swept aside. With DISTINCT ON they do not. The missing value is its own distinct group, with its own kept row, sorted to wherever the ORDER BY puts it. If the product team had wanted to exclude uncategorized products, the query would have needed an explicit WHERE category_id IS NOT NULL before the per-category pick. Without that filter, the missing-category bucket is in the result by default. The corollary on the sort side: in an ascending ORDER BY, missing values sort last, which is why the uncategorized row appears at the end of the result rather than at the top.

You practiced DISTINCT ON (category_id) over a column that contains missing values — PostgreSQL treats every distinct value (including the missing value itself) as a separate group, so the missing-category_id records form their own bucket.

Return one row per category group, showing the category ID, ID of the lowest-priced product in that group, the product name, and the price. Sort the final result by `category_id` ascending

The shape

Clause by clause

Why this and not `ROW_NUMBER`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not ROW_NUMBER

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not `ROW_NUMBER`