Scenario: Helix Systems' HR pipeline computing per-department average salary is producing a missing-value average for several departments. To determine whether the cause is departments with no employees or employees with no current salary records, the diagnostic surfaces both metrics alongside each department row.
Task: Write a query to return each department's id, name, employee_count, and avg_salary — the average current salary across that department's employees — for every department.
Assumptions:
- An active salary record has
end_daterecorded as a missing value. - A department's
employee_countis the count of employees assigned to it, including departments with no employees (which appear withemployee_countof0). - A department's
avg_salaryis the average of every active salary across its employees, reported as a missing value when no qualifying salary record is on file. - The result covers every department.
Output:
- One row per department.
- Columns in this order:
department_id,department_name,employee_count,avg_salary. - Sorted by
department_nameascending.
Schema · hr 4 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
dept_headcount AS (
SELECT
d.id AS department_id,
d.name AS department_name,
COUNT(e.id) AS employee_count
FROM
departments d
LEFT JOIN employees e ON e.department_id = d.id
GROUP BY
d.id,
d.name
),
dept_salary AS (
SELECT
e.department_id,
AVG(s.amount) AS avg_salary
FROM
employees e
LEFT JOIN salaries s ON s.employee_id = e.id
AND s.end_date IS NULL
GROUP BY
e.department_id
)
SELECT
dh.department_id,
dh.department_name,
dh.employee_count,
ds.avg_salary
FROM
dept_headcount dh
LEFT JOIN dept_salary ds ON ds.department_id = dh.department_id
ORDER BY
dh.department_name The shape
Two independent diagnostic measurements in separate CTEs, each preserving the parent layer on its left side, joined together on department_id so the row carries both readings. dept_headcount reports "does this department have employees on staff?" and dept_salary reports "do those employees have active salary records?" When avg_salary comes back missing, the headcount column on the same row identifies which side of the chain produced the NULL.
Clause by clause
WITH dept_headcount AS (SELECT d.id AS department_id, d.name AS department_name, COUNT(e.id) AS employee_count FROM departments d LEFT JOIN employees e ON e.department_id = d.id GROUP BY d.id, d.name)counts employees per department, keeping every department through the left join.COUNT(e.id)returns 0 (not NULL) for departments with no employees because the unmatched right side carries a NULLe.idandCOUNT(col)skips NULLs while still producing a row for the group.dept_salary AS (SELECT e.department_id, AVG(s.amount) AS avg_salary FROM employees e LEFT JOIN salaries s ON s.employee_id = e.id AND s.end_date IS NULL GROUP BY e.department_id)averages active salaries per department. Pinnings.end_date IS NULLto the join condition rather than aWHEREis load-bearing — it keeps employees with no active salary record in the join (with NULL on the salary side) instead of filtering them out.AVGskips NULLs, so a department where every employee lacks an active record produces NULL.SELECT dh.department_id, dh.department_name, dh.employee_count, das.avg_salaryputs the two measurements on a single row.FROM dept_headcount dh LEFT JOIN dept_salary das ON das.department_id = dh.department_idpreserves every department from the headcount layer. A department absent fromdept_salary(no employees at all, so noemployeesrow to feed the salary CTE) shows up here withavg_salaryof NULL.ORDER BY dh.department_namesorts alphabetically. The reference has Data through Support all returning real averages and non-zero headcounts on this particular dataset, but the structure remains the correct diagnostic shape for the cases where it doesn't.
The trap
Move the end_date IS NULL predicate from the join condition to a WHERE clause and the query silently changes shape. WHERE runs after the join, so employees whose only salary row has a non-null end_date get filtered out entirely — and the per-department AVG is now computed over the wrong set. The result still looks like a valid per-department average. It is not. The rule is operational: when filtering the right side of a left join, the filter belongs on the ON clause; placing it in WHERE collapses the left join into an inner join in everything but name, and that collapse is invisible until someone reconciles the count.
You practiced separating two diagnostic measurements into independent layers, so a missing-value average tells you exactly which side of the chain is empty — the empty-departments side or the no-current-salary side.