Brightlane's CRM team is identifying the most widely purchased order statuses — those touching the broadest customer base.
Write a query to return each status and its unique-customer count for statuses that have been placed by more than ten different customers.
Assumptions:
- The
orderstable contains every order Brightlane has processed. - A customer with multiple orders in the same status counts once for that status (not once per order).
- The threshold (
> 10) applies to the per-status unique-customer count.
Output:
- One row per qualifying status, with columns
statusandunique_customers.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
status,
COUNT(DISTINCT customer_id) AS unique_customers
FROM
orders
GROUP BY
status
HAVING
COUNT(DISTINCT customer_id) > 10 The shape
The statuses are the groups; the per-group metric is the count of distinct customers, not the count of orders. COUNT(DISTINCT customer_id) collapses repeat customers to one per status, and HAVING COUNT(DISTINCT customer_id) > 10 keeps only the statuses that span a broad customer base. delivered reaches 59 unique customers, shipped reaches 17, pending reaches 11. Other statuses fall below the threshold and drop out.
Clause by clause
SELECT status, COUNT(DISTINCT customer_id) AS unique_customersreturns each status with its unique-customer count. TheDISTINCTinsideCOUNTis what makes a customer with three delivered orders contribute1todelivered's count rather than3.FROM ordersis the source set.GROUP BY statuspartitions the orders by their status value. After this clause, each row in the working set represents one status with its underlying order rows aggregated behind it.HAVING COUNT(DISTINCT customer_id) > 10filters those status rows by the unique-customer metric. Statuses placed by ten or fewer distinct customers drop out; eleven or more survive.
Why this and not COUNT(*)
COUNT(*) and COUNT(DISTINCT customer_id) answer different questions on the same data. COUNT(*) would return the number of orders in each status — delivered would land somewhere above 100 because most orders are delivered and many customers ordered multiple times. COUNT(DISTINCT customer_id) returns the number of customers behind those orders, which is what "placed by more than ten different customers" actually asks. A status with a thousand orders from three customers would clear a COUNT(*) > 10 bar but fail the breadth test the CRM team is running.
The shape generalises. Once an aggregate is computing a per-group number, HAVING can compare it to anything — a literal threshold, another aggregate, even an arithmetic combination of aggregates. The constraint is only that the left side has to be an aggregate, not a raw column reference.
You practiced filtering on a COUNT(DISTINCT col) aggregate. The composability of HAVING with any aggregate is the recurring shape — once an aggregate produces a per-group number, HAVING can compare it to anything.