Brightlane's operations team wants to know when each customer first reached each order status — the first time a customer placed a delivered order, the first time a cancelled order, and so on.
Write a query to return one row per customer-status pair on record, showing the customer ID, status, ID of the earliest order in that pair, when it was placed, and the order amount. Sort the final result by customer_id ascending, then status ascending.
Assumptions:
- For each customer-status pair, the earliest order is the order with the smallest
ordered_atamong that customer's orders carrying that status. - Each customer-status pair on record should appear once. Pairs with no orders on record do not appear.
- The final result is sorted by
customer_idascending, thenstatusascending.
Output:
- One row per customer-status pair on record, with columns
customer_id,status,order_id,ordered_at, andtotal_amount. Sorted bycustomer_id, thenstatus.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT DISTINCT
ON (customer_id, status) customer_id,
status,
id AS order_id,
ordered_at,
total_amount
FROM
orders
ORDER BY
customer_id,
status,
ordered_at The shape
The deduplication key can be a tuple. DISTINCT ON (customer_id, status) keeps one row per unique combination of those two values, and ORDER BY customer_id, status, ordered_at (all ascending) picks the earliest order in each combination. A customer with five delivered and three cancelled orders contributes two rows: their earliest delivered order and their earliest cancelled order.
Clause by clause
SELECT DISTINCT ON (customer_id, status) customer_id, status, id AS order_id, ordered_at, total_amountreturns the five columns the operations review needs.DISTINCT ON (customer_id, status)declares one row per distinct(customer_id, status)pair — a customer can appear multiple times in the result, once for each status they ever reached.FROM ordersreads the order records. Customers with no orders never enter this row source.ORDER BY customer_id, status, ordered_atsorts the orders so that within each(customer_id, status)group, the oldest order sits first. The first two columns of the sort must match theDISTINCT ONexpressions exactly, in the same order — that is how PostgreSQL is able to group duplicates and pick the first row from each group. The third sort key,ordered_atascending, is the tiebreaker that picks the earliest order inside each group.
Why this and not ROW_NUMBER
The multi-key version is just PARTITION BY with two columns:
SELECT customer_id, status, order_id, ordered_at, total_amount
FROM (
SELECT customer_id, status, id AS order_id, ordered_at, total_amount,
ROW_NUMBER() OVER (PARTITION BY customer_id, status ORDER BY ordered_at) AS rn
FROM orders
) ranked
WHERE rn = 1
ORDER BY customer_id, statusBoth forms return one row per customer-status pair. The DISTINCT ON version scales naturally as more keys are added: list them in DISTINCT ON, list them in the leading positions of ORDER BY, done.
You practiced multi-key DISTINCT ON — list two keys; the per-group pick happens for each unique combination of those two values.