Brightlane's analyst is building an intermediate dataset of high-activity customers — those with more than 3 orders on record — for a multi-step retention analysis. The dataset is materialized into a temp table once and then read by every downstream stage.
Write a query to return each high-activity customer's ID and order count.
Assumptions:
- The
orderstable has one row per order with acustomer_id. - A customer's order count is the number of orders linked to that
customer_id. - Only customers whose order count is greater than
3should appear.
Output:
- One row per qualifying customer, with columns
customer_idandorder_count.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
customer_id,
order_count
FROM
(
SELECT
customer_id,
COUNT(*) AS order_count
FROM
orders
GROUP BY
customer_id
) customer_stats
WHERE
order_count > 3 The shape
Compute the per-customer order count in an inner step, then filter that result to keep only customers whose count exceeds 3. The inner aggregation has to finish before the threshold can be applied, because the threshold is a condition on order_count, a column that does not exist until the aggregation has run. A derived table is what gives the outer query something with that column to filter against.
Clause by clause
(SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id) customer_statsis the inner derived table. It groupsordersbycustomer_id, computes the per-customer order count, and exposes a two-column result that the outer query can treat exactly like a real table. The aliascustomer_statsnames that result.SELECT customer_id, order_count FROM customer_statsreads the derived table and returns both columns to the final result.WHERE order_count > 3filters the derived table to high-activity customers only. The filter runs after the aggregation, soorder_countis a real value by the time the comparison runs. Customers whose count is 3 or below are dropped before the result is materialized.
Why this and not a WHERE on the raw orders table
WHERE runs before GROUP BY, on individual rows, before any aggregate has been computed. A condition like WHERE order_count > 3 written directly on orders would refer to a column that does not exist on the raw row. The aggregation has to happen first, the derived table exposes its result as a row source, and only then can WHERE apply a condition that compares against the aggregate.
You practiced computing a per-customer count and applying a threshold against it — the kind of restricted aggregate worth caching in a temp table when downstream reports reference it repeatedly.