Scenario: Brightlane's data analyst ran EXPLAIN on a per-country orders rollup and saw the per-country grouping step estimating only 3 country groups — a count the analyst doubts, given recent customer acquisitions outside the planner's stale statistics window.
Task: Write a query to return each customer country and the number of orders placed by customers from that country, so the analyst can see the actual group count.
Assumptions:
- One row in the result covers every order placed by
customerssharing the same country.
Output:
- One row per customer country present in the data.
- Columns in this order:
country,order_count.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
c.country,
COUNT(o.id) AS order_count
FROM
orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY
c.country The shape
The planner estimated 3 country groups, but the real grouping step produces 18 — one for every country that has at least one order. Joining orders to customers brings the country column into reach, and GROUP BY c.country is what turns the joined row stream into one row per country with its order count.
Clause by clause
SELECT c.country, COUNT(o.id) AS order_countreturns the country label and the count of orders attributed to it.COUNT(o.id)counts each joined order row once, so multiple orders from the same country accumulate within the same group.FROM orders oreads the order records, aliased so each reference stays short.JOIN customers c ON o.customer_id = c.idmatches each order to the customer who placed it. The inner join discards any order without a matching customer record, which is the right filter here — an orderless customer or a customer-less order shouldn't appear in a per-country order tally.GROUP BY c.countrypartitions the joined rows by country, so the count runs once per country.
The trap
The planner's group-count estimate (3) is far below the actual (18). On the post-join grouping step, the planner depends on statistics for the column being grouped on — customers.country here — to estimate the number of distinct values. Stale statistics that predate recent customer acquisitions report old distinct-value counts, the planner picks a hash-aggregate memory allocation sized for 3 groups, and the actual 18 spill out into a much larger structure than expected. Running ANALYZE customers refreshes the distinct-value statistic and the next plan estimate lines up with reality.
You practiced computing the real per-group count to compare against EXPLAIN's grouping-step estimate — large gaps at that step are a primary failure mode for stale statistics.