N046 Tier 4 · Advanced

DISTINCT ON in SQL

DISTINCT ON returns one row per distinct value of a specified expression, choosing which row to keep based on ORDER BY. It is PostgreSQL-specific syntax that solves the "one row per group, with full row data" problem that standard SQL requires a subquery or window function to handle.

DISTINCT ON returns exactly one row per distinct value of a specified expression, and you control which row it keeps using ORDER BY. It's PostgreSQL-specific syntax for the "latest record per group" type of problem — and it solves it in one clause instead of a subquery.

The pattern comes up constantly in real analytical work. Your events table has multiple rows per user and you want each user's most recent event — with all the event data, not just the user ID. Your orders table has multiple orders per customer and you want the latest order per customer. Your prices table has historical entries and you want the current price per product. All of these share the same structure: multiple rows per key, keep the one that sorts first.

The classic use case: your orders table has multiple orders per customer, and you want each customer's most recent order — with all the order data attached.

DISTINCT ON (customer_id) says: deduplicate on customer_id. ORDER BY customer_id, ordered_at DESC says: sort each customer's rows by most recent first, then keep only the first row. The result is one row per customer — their most recent order with the full order data attached.

This is different from plain DISTINCT. Plain DISTINCT removes rows that are identical across all selected columns. DISTINCT ON removes duplicates based only on the expressions you list in parentheses, regardless of what the other columns contain.

The one thing that trips people up

The ORDER BY clause must begin with the same expressions you listed in DISTINCT ON, in the same order. PostgreSQL requires this — it uses the sort to group duplicate keys together before picking the first row in each group.

This works:

ORDER BY customer_id, ordered_at DESC

This raises an error:

ORDER BY ordered_at DESC  -- customer_id must come first

The ROW_NUMBER equivalent

The same result can be written using ROW_NUMBER:

SELECT user_id, event_type, event_time
FROM (
    SELECT
        user_id,
        event_type,
        event_time,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time DESC) AS rn
    FROM events
) ranked
WHERE rn = 1

Both return the same rows. DISTINCT ON is shorter and typically faster in PostgreSQL for this pattern. ROW_NUMBER is more portable to other databases and more explicit about its logic. For teams working in PostgreSQL, DISTINCT ON is idiomatic for "one row per group, keep the top one."

One edge case: if two rows tie on the ORDER BY columns within a group — two events with the exact same timestamp for the same user — PostgreSQL picks one but doesn't guarantee which. Add a tiebreaker to ORDER BY (like a primary key) to make the result deterministic.

When DISTINCT ON isn't enough

DISTINCT ON always keeps the row that sorts first in each group. If you need the second-most-recent event, or the nth row, or need to filter on which row to keep based on a condition rather than sort order, DISTINCT ON can't help. For those cases, use ROW_NUMBER with a PARTITION BY. DISTINCT ON is the shortcut for the common "keep the top one" case.

Practice

9 DISTINCT ON practice problems

Write a query to return one row per customer with at least one order, showing that customer's ID, the ID of their most recent order, when it was placed, and the order amount. Sort the final result by `customer_id` ascending.

easy ecommerce

Write a query to return one row per status, showing the status, the ID of the most recent order in that status, when it was placed, and the order amount. Sort the final result by `status` ascending.

easy ecommerce

Write a query to return one row per user with at least one session, showing the user's ID, the ID of their most recent session, when it started, and the event count. Sort the final result by `user_id` ascending.

easy analytics

Write a query to return one row per customer with at least one order, showing that customer's ID, the ID of their earliest order, when it was placed, and the order amount. Sort the final result by `customer_id` ascending.

medium ecommerce

Write a query to return one row per customer with at least one delivered order, showing that customer's ID, the ID of their most recent delivered order, when it was placed, and the order amount. Sort the final result by `customer_id` ascending.

medium ecommerce

Write a query to return one row per customer-status pair on record, showing the customer ID, status, ID of the earliest order in that pair, when it was placed, and the order amount. Sort the final result by `customer_id` ascending, then `status` ascending.

medium ecommerce

Write a query to return one row per session with at least one event, showing the session ID, ID of the most recent event in that session, the event type, and when the event occurred. Sort the final result by `session_id` ascending.

medium analytics

Write a query to return one row per customer with at least one order, showing that customer's ID, their most recent order's ID, the order date, and the order amount. Sort the final result by `customer_id` ascending.

hard ecommerce

Write a query to return one row per category group, showing the category ID, ID of the lowest-priced product in that group, the product name, and the price. Sort the final result by `category_id` ascending.

hard ecommerce

These problems are part of the DISTINCT ON lesson in SQLMaxx, with instant grading and a worked solution on each.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.