Employee Count per Department — Reading EXPLAIN Output in SQL

The problem

Scenario: Helix Systems' data analyst ran EXPLAIN on a per-department employee count and saw the per-department grouping step reporting 3 output groups — well below the number of departments shown on the current organization chart. The analyst suspects table statistics were never refreshed after the most recent restructuring.

Task: Write a query to return each department name and the number of employees assigned to it, so the analyst can compare the real group count against the planner's estimate.

Assumptions:

One row in the result covers every employee assigned to the same department.

Output:

One row per department with at least one employee assigned to it.
Columns in this order: department_name, employee_count.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

SELECT
  d.name AS department_name,
  COUNT(e.id) AS employee_count
FROM
  employees e
  JOIN departments d ON e.department_id = d.id
GROUP BY
  d.name

The shape

The planner reported 3 department groups at the grouping step; the real count is 8 — the gap is exactly the kind of distinct-value misestimate that happens when ANALYZE hasn't run since new departments were added. The query joins employees to departments, groups by department name, and counts. The shape mirrors what EXPLAIN was estimating, so the actual group count lines up against the plan's number directly.

Clause by clause

SELECT d.name AS department_name, COUNT(e.id) AS employee_count returns the department name and the count of employees in it. COUNT(e.id) counts each joined employee row, so each department's total comes from its own group.
FROM employees e reads the employee records.
JOIN departments d ON e.department_id = d.id resolves each employee's department to its name. The inner join drops employees whose department_id doesn't resolve to a department row — which shouldn't happen in a clean schema, but is the right default if it does.
GROUP BY d.name partitions the joined rows by department name. Each output row is one department's tally.

The trap

The planner's per-group estimate at a hash-aggregate node depends on the distinct-value statistic for the grouped column. Restructuring that adds new departments without a subsequent ANALYZE departments leaves that statistic frozen at the pre-restructure value — the planner keeps reporting 3 groups even though the table now has 8. The fix isn't a query rewrite; it's running ANALYZE on the table whose distinct-value count went stale. Until then, the planner sizes its hash table for 3 buckets and rehashes when the 4th, 5th, and 6th group show up at runtime — visible as a higher actual cost on the plan node even though the row counts are right.

You practiced cross-checking a planner's per-group estimate against reality — stale statistics often hide newly-added partitions at the grouping step.

Return each department name and the number of `employees` assigned to it, so the analyst can compare the real group count against the planner's estimate

The shape

Clause by clause

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.