Latest Order per Customer and Status

The problem

Brightlane's operations team wants to know when each customer first reached each order status — the first time a customer placed a delivered order, the first time a cancelled order, and so on.

Write a query to return one row per customer-status pair on record, showing the customer ID, status, ID of the earliest order in that pair, when it was placed, and the order amount. Sort the final result by customer_id ascending, then status ascending.

Assumptions:

For each customer-status pair, the earliest order is the order with the smallest ordered_at among that customer's orders carrying that status.
Each customer-status pair on record should appear once. Pairs with no orders on record do not appear.
The final result is sorted by customer_id ascending, then status ascending.

Output:

One row per customer-status pair on record, with columns customer_id, status, order_id, ordered_at, and total_amount. Sorted by customer_id, then status.

Schema · ecommerce 5 tables

The shape

The deduplication key can be a tuple. DISTINCT ON (customer_id, status) keeps one row per unique combination of those two values, and ORDER BY customer_id, status, ordered_at (all ascending) picks the earliest order in each combination. A customer with five delivered and three cancelled orders contributes two rows: their earliest delivered order and their earliest cancelled order.

Clause by clause

SELECT DISTINCT ON (customer_id, status) customer_id, status, id AS order_id, ordered_at, total_amount returns the five columns the operations review needs. DISTINCT ON (customer_id, status) declares one row per distinct (customer_id, status) pair — a customer can appear multiple times in the result, once for each status they ever reached.
FROM orders reads the order records. Customers with no orders never enter this row source.
ORDER BY customer_id, status, ordered_at sorts the orders so that within each (customer_id, status) group, the oldest order sits first. The first two columns of the sort must match the DISTINCT ON expressions exactly, in the same order — that is how PostgreSQL is able to group duplicates and pick the first row from each group. The third sort key, ordered_at ascending, is the tiebreaker that picks the earliest order inside each group.

Why this and not `ROW_NUMBER`

The multi-key version is just PARTITION BY with two columns:

SELECT customer_id, status, order_id, ordered_at, total_amount
FROM (
  SELECT customer_id, status, id AS order_id, ordered_at, total_amount,
    ROW_NUMBER() OVER (PARTITION BY customer_id, status ORDER BY ordered_at) AS rn
  FROM orders
) ranked
WHERE rn = 1
ORDER BY customer_id, status

Both forms return one row per customer-status pair. The DISTINCT ON version scales naturally as more keys are added: list them in DISTINCT ON, list them in the leading positions of ORDER BY, done.

You practiced multi-key DISTINCT ON — list two keys; the per-group pick happens for each unique combination of those two values.

Return one row per customer-status pair on record, showing the customer ID, status, ID of the earliest order in that pair, when it was placed, and the order amount. Sort the final result by `customer_id` ascending, then `status` ascending

The shape

Clause by clause

Why this and not `ROW_NUMBER`

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not ROW_NUMBER

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not `ROW_NUMBER`