Separate Aggregates Joined by Department — Debugging Queries in SQL

The problem

Scenario: Helix Systems' HR pipeline computing per-department average salary is producing a missing-value average for several departments. To determine whether the cause is departments with no employees or employees with no current salary records, the diagnostic surfaces both metrics alongside each department row.

Task: Write a query to return each department's id, name, employee_count, and avg_salary — the average current salary across that department's employees — for every department.

Assumptions:

An active salary record has end_date recorded as a missing value.
A department's employee_count is the count of employees assigned to it, including departments with no employees (which appear with employee_count of 0).
A department's avg_salary is the average of every active salary across its employees, reported as a missing value when no qualifying salary record is on file.
The result covers every department.

Output:

One row per department.
Columns in this order: department_id, department_name, employee_count, avg_salary.
Sorted by department_name ascending.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  dept_headcount AS (
    SELECT
      d.id AS department_id,
      d.name AS department_name,
      COUNT(e.id) AS employee_count
    FROM
      departments d
      LEFT JOIN employees e ON e.department_id = d.id
    GROUP BY
      d.id,
      d.name
  ),
  dept_salary AS (
    SELECT
      e.department_id,
      AVG(s.amount) AS avg_salary
    FROM
      employees e
      LEFT JOIN salaries s ON s.employee_id = e.id
      AND s.end_date IS NULL
    GROUP BY
      e.department_id
  )
SELECT
  dh.department_id,
  dh.department_name,
  dh.employee_count,
  ds.avg_salary
FROM
  dept_headcount dh
  LEFT JOIN dept_salary ds ON ds.department_id = dh.department_id
ORDER BY
  dh.department_name

The shape

Two independent diagnostic measurements in separate CTEs, each preserving the parent layer on its left side, joined together on department_id so the row carries both readings. dept_headcount reports "does this department have employees on staff?" and dept_salary reports "do those employees have active salary records?" When avg_salary comes back missing, the headcount column on the same row identifies which side of the chain produced the NULL.

Clause by clause

WITH dept_headcount AS (SELECT d.id AS department_id, d.name AS department_name, COUNT(e.id) AS employee_count FROM departments d LEFT JOIN employees e ON e.department_id = d.id GROUP BY d.id, d.name) counts employees per department, keeping every department through the left join. COUNT(e.id) returns 0 (not NULL) for departments with no employees because the unmatched right side carries a NULL e.id and COUNT(col) skips NULLs while still producing a row for the group.
dept_salary AS (SELECT e.department_id, AVG(s.amount) AS avg_salary FROM employees e LEFT JOIN salaries s ON s.employee_id = e.id AND s.end_date IS NULL GROUP BY e.department_id) averages active salaries per department. Pinning s.end_date IS NULL to the join condition rather than a WHERE is load-bearing — it keeps employees with no active salary record in the join (with NULL on the salary side) instead of filtering them out. AVG skips NULLs, so a department where every employee lacks an active record produces NULL.
SELECT dh.department_id, dh.department_name, dh.employee_count, das.avg_salary puts the two measurements on a single row.
FROM dept_headcount dh LEFT JOIN dept_salary das ON das.department_id = dh.department_id preserves every department from the headcount layer. A department absent from dept_salary (no employees at all, so no employees row to feed the salary CTE) shows up here with avg_salary of NULL.
ORDER BY dh.department_name sorts alphabetically. The reference has Data through Support all returning real averages and non-zero headcounts on this particular dataset, but the structure remains the correct diagnostic shape for the cases where it doesn't.

The trap

Move the end_date IS NULL predicate from the join condition to a WHERE clause and the query silently changes shape. WHERE runs after the join, so employees whose only salary row has a non-null end_date get filtered out entirely — and the per-department AVG is now computed over the wrong set. The result still looks like a valid per-department average. It is not. The rule is operational: when filtering the right side of a left join, the filter belongs on the ON clause; placing it in WHERE collapses the left join into an inner join in everything but name, and that collapse is invisible until someone reconciles the count.

You practiced separating two diagnostic measurements into independent layers, so a missing-value average tells you exactly which side of the chain is empty — the empty-departments side or the no-current-salary side.

Return each department's `id`, name, `employee_count`, and `avg_salary` — the average current salary across that department's employees — for every department

The shape

Clause by clause

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.