Scenario: Helix Systems' HR salary report is showing a missing-value average salary for two departments. To determine whether the cause is departments with no employees or employees with no current salary records, the diagnostic places both metrics side by side per department.
Task: Write a query to return each department's id, name, total employee_count, and avg_salary — the average current salary across employees in that department.
Assumptions:
- An active salary record has
end_daterecorded as a missing value. - A department's
employee_countis the count of employees assigned to it; departments with no employees appear withemployee_countof0. - A department's
avg_salaryis the average of every active salary across its employees, reported as a missing value when no qualifying salary record is on file. - The result covers every department.
Output:
- One row per department.
- Columns in this order:
department_id,department_name,employee_count,avg_salary. - Sorted by
department_nameascending.
Schema · hr 4 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
dept_headcount AS (
SELECT
d.id AS department_id,
d.name AS department_name,
COUNT(e.id) AS employee_count
FROM
departments d
LEFT JOIN employees e ON e.department_id = d.id
GROUP BY
d.id,
d.name
),
dept_avg_salary AS (
SELECT
e.department_id,
AVG(s.amount) AS avg_salary
FROM
employees e
LEFT JOIN salaries s ON s.employee_id = e.id
AND s.end_date IS NULL
GROUP BY
e.department_id
)
SELECT
dh.department_id,
dh.department_name,
dh.employee_count,
das.avg_salary
FROM
dept_headcount dh
LEFT JOIN dept_avg_salary das ON das.department_id = dh.department_id
ORDER BY
dh.department_name The shape
Two CTEs, each measuring one thing about a department, joined back to each other on department_id. dept_headcount answers "does this department have employees?"; dept_avg_salary answers "do those employees have active salary records?" Separating them is what makes a missing-value avg_salary legible — the row's employee_count tells you which side of the chain is empty.
Clause by clause
WITH dept_headcount AS (SELECT d.id AS department_id, d.name AS department_name, COUNT(e.id) AS employee_count FROM departments d LEFT JOIN employees e ON e.department_id = d.id GROUP BY d.id, d.name)counts employees per department. The left join keeps every department, andCOUNT(e.id)returns 0 for departments with no employees because the unmatchede.idis NULL andCOUNT(col)ignores NULLs.dept_avg_salary AS (SELECT e.department_id, AVG(s.amount) AS avg_salary FROM employees e LEFT JOIN salaries s ON s.employee_id = e.id AND s.end_date IS NULL GROUP BY e.department_id)averages active salaries per department. Theend_date IS NULLpredicate is on the join condition so unmatched employees still contribute a row to the per-department group, andAVGreturns NULL when every salary in the group is NULL.SELECT dh.department_id, dh.department_name, dh.employee_count, das.avg_salarypulls the headcount column from one CTE and the salary column from the other onto a single row per department.FROM dept_headcount dh LEFT JOIN dept_avg_salary das ON das.department_id = dh.department_idkeeps every department from the headcount side. A department with no employees has no row indept_avg_salaryat all, so the left join produces NULL foravg_salary— the visible signal of an empty-headcount cause.ORDER BY dh.department_namesorts alphabetically. The reference shows Data through Support, each with a non-zero headcount and a real average — meaning on this particular dataset both sides are populated everywhere, but the structure is ready to surface either failure mode the moment one appears.
Why this and not one combined aggregate
A single query joining departments to employees to salaries and aggregating both COUNT and AVG in one SELECT collapses the two measurements into one layer. When the average comes back missing, the analyst has no way to tell whether the chain broke at departments → employees or at employees → salaries. Splitting them into independent CTEs gives each measurement its own provenance, which is the entire point of the diagnostic.
You practiced separating two diagnostic measurements into independent layers — headcount on one side, salary average on the other — so each can be inspected on its own and the source of a missing-value average is obvious from the row.