Scenario: Helix Systems' data analyst ran EXPLAIN on a per-department employee count and saw the per-department grouping step reporting 3 output groups — well below the number of departments shown on the current organization chart. The analyst suspects table statistics were never refreshed after the most recent restructuring.
Task: Write a query to return each department name and the number of employees assigned to it, so the analyst can compare the real group count against the planner's estimate.
Assumptions:
- One row in the result covers every employee assigned to the same department.
Output:
- One row per department with at least one employee assigned to it.
- Columns in this order:
department_name,employee_count.
Schema · hr 4 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
d.name AS department_name,
COUNT(e.id) AS employee_count
FROM
employees e
JOIN departments d ON e.department_id = d.id
GROUP BY
d.name The shape
The planner reported 3 department groups at the grouping step; the real count is 8 — the gap is exactly the kind of distinct-value misestimate that happens when ANALYZE hasn't run since new departments were added. The query joins employees to departments, groups by department name, and counts. The shape mirrors what EXPLAIN was estimating, so the actual group count lines up against the plan's number directly.
Clause by clause
SELECT d.name AS department_name, COUNT(e.id) AS employee_countreturns the department name and the count of employees in it.COUNT(e.id)counts each joined employee row, so each department's total comes from its own group.FROM employees ereads the employee records.JOIN departments d ON e.department_id = d.idresolves each employee's department to its name. The inner join drops employees whosedepartment_iddoesn't resolve to a department row — which shouldn't happen in a clean schema, but is the right default if it does.GROUP BY d.namepartitions the joined rows by department name. Each output row is one department's tally.
The trap
The planner's per-group estimate at a hash-aggregate node depends on the distinct-value statistic for the grouped column. Restructuring that adds new departments without a subsequent ANALYZE departments leaves that statistic frozen at the pre-restructure value — the planner keeps reporting 3 groups even though the table now has 8. The fix isn't a query rewrite; it's running ANALYZE on the table whose distinct-value count went stale. Until then, the planner sizes its hash table for 3 buckets and rehashes when the 4th, 5th, and 6th group show up at runtime — visible as a higher actual cost on the plan node even though the row counts are right.
You practiced cross-checking a planner's per-group estimate against reality — stale statistics often hide newly-added partitions at the grouping step.