Brightlane's CRM team is assessing how broadly each order status touches the customer base.
Write a query to return each status and the number of unique customers who have placed at least one order in that status.
Assumptions:
- The
orderstable contains every order Brightlane has processed. - A customer with multiple orders in the same status should be counted once for that status (not once per order).
Output:
- One row per
statusvalue, with columnsstatusandunique_customers.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
status,
COUNT(DISTINCT customer_id) AS unique_customers
FROM
orders
GROUP BY
status The shape
COUNT(DISTINCT customer_id) deduplicates customers inside each status bucket before counting them. A buyer with three delivered orders contributes one to the delivered count, not three. The delivered row shows 59 unique customers across 161 total delivered orders, which is exactly the spread between an order count and a customer count.
Clause by clause
SELECT status, COUNT(DISTINCT customer_id) AS unique_customersreturns the status label and the deduplicated customer count for that status.statusis inGROUP BY; the count is an aggregate.FROM ordersis the input population.GROUP BY statuspartitions the orders by status before the count runs.COUNT(DISTINCT customer_id)then runs inside each partition, andDISTINCTonly deduplicates within the partition it is running in.
Why this and not COUNT(*)
COUNT(*) per status answers "how many orders are in each status," which double-counts a buyer whose three orders all landed in the same status. The CRM team wants reach, not volume. COUNT(DISTINCT customer_id) strips the repeats inside each bucket, so the result is the number of unique buyers touched by each status. The two answers differ by exactly the amount of repeat business inside each pipeline stage.
The trap
The same customer can appear in the count for multiple statuses, and that is correct, not a bug. DISTINCT deduplicates within a group, not across groups. A buyer with one delivered order and one cancelled order adds one to the delivered count and one to the cancelled count. Summing the four unique_customers values across statuses therefore does not give you the total number of distinct customers in the orders table. For that, you would need COUNT(DISTINCT customer_id) over the whole table with no GROUP BY.
You practiced combining COUNT(DISTINCT col) with GROUP BY. The aggregate's per-group behavior compounds: DISTINCT deduplicates within each group independently, so the same customer can appear in the count of two different statuses without contradiction.