Headcount and Average Salary by Department

The problem

Scenario: Helix Systems' HR salary report is showing a missing-value average salary for two departments. To determine whether the cause is departments with no employees or employees with no current salary records, the diagnostic places both metrics side by side per department.

Task: Write a query to return each department's id, name, total employee_count, and avg_salary — the average current salary across employees in that department.

Assumptions:

An active salary record has end_date recorded as a missing value.
A department's employee_count is the count of employees assigned to it; departments with no employees appear with employee_count of 0.
A department's avg_salary is the average of every active salary across its employees, reported as a missing value when no qualifying salary record is on file.
The result covers every department.

Output:

One row per department.
Columns in this order: department_id, department_name, employee_count, avg_salary.
Sorted by department_name ascending.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  dept_headcount AS (
    SELECT
      d.id AS department_id,
      d.name AS department_name,
      COUNT(e.id) AS employee_count
    FROM
      departments d
      LEFT JOIN employees e ON e.department_id = d.id
    GROUP BY
      d.id,
      d.name
  ),
  dept_avg_salary AS (
    SELECT
      e.department_id,
      AVG(s.amount) AS avg_salary
    FROM
      employees e
      LEFT JOIN salaries s ON s.employee_id = e.id
      AND s.end_date IS NULL
    GROUP BY
      e.department_id
  )
SELECT
  dh.department_id,
  dh.department_name,
  dh.employee_count,
  das.avg_salary
FROM
  dept_headcount dh
  LEFT JOIN dept_avg_salary das ON das.department_id = dh.department_id
ORDER BY
  dh.department_name

The shape

Two CTEs, each measuring one thing about a department, joined back to each other on department_id. dept_headcount answers "does this department have employees?"; dept_avg_salary answers "do those employees have active salary records?" Separating them is what makes a missing-value avg_salary legible — the row's employee_count tells you which side of the chain is empty.

Clause by clause

WITH dept_headcount AS (SELECT d.id AS department_id, d.name AS department_name, COUNT(e.id) AS employee_count FROM departments d LEFT JOIN employees e ON e.department_id = d.id GROUP BY d.id, d.name) counts employees per department. The left join keeps every department, and COUNT(e.id) returns 0 for departments with no employees because the unmatched e.id is NULL and COUNT(col) ignores NULLs.
dept_avg_salary AS (SELECT e.department_id, AVG(s.amount) AS avg_salary FROM employees e LEFT JOIN salaries s ON s.employee_id = e.id AND s.end_date IS NULL GROUP BY e.department_id) averages active salaries per department. The end_date IS NULL predicate is on the join condition so unmatched employees still contribute a row to the per-department group, and AVG returns NULL when every salary in the group is NULL.
SELECT dh.department_id, dh.department_name, dh.employee_count, das.avg_salary pulls the headcount column from one CTE and the salary column from the other onto a single row per department.
FROM dept_headcount dh LEFT JOIN dept_avg_salary das ON das.department_id = dh.department_id keeps every department from the headcount side. A department with no employees has no row in dept_avg_salary at all, so the left join produces NULL for avg_salary — the visible signal of an empty-headcount cause.
ORDER BY dh.department_name sorts alphabetically. The reference shows Data through Support, each with a non-zero headcount and a real average — meaning on this particular dataset both sides are populated everywhere, but the structure is ready to surface either failure mode the moment one appears.

Why this and not one combined aggregate

A single query joining departments to employees to salaries and aggregating both COUNT and AVG in one SELECT collapses the two measurements into one layer. When the average comes back missing, the analyst has no way to tell whether the chain broke at departments → employees or at employees → salaries. Splitting them into independent CTEs gives each measurement its own provenance, which is the entire point of the diagnostic.

You practiced separating two diagnostic measurements into independent layers — headcount on one side, salary average on the other — so each can be inspected on its own and the source of a missing-value average is obvious from the row.

Return each department's `id`, name, total `employee_count`, and `avg_salary` — the average current salary across employees in that department

The shape

Clause by clause

Why this and not one combined aggregate

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.