N066-H2 Tier 5 · Expert · hard hr · Helix Systems

Return each department's `id`, name, `employee_count`, and `avg_salary` — the average current salary across that department's employees — for every department

Part of Analyst Debugging Patterns in SQL

The problem

Scenario: Helix Systems' HR pipeline computing per-department average salary is producing a missing-value average for several departments. To determine whether the cause is departments with no employees or employees with no current salary records, the diagnostic surfaces both metrics alongside each department row.

Task: Write a query to return each department's id, name, employee_count, and avg_salary — the average current salary across that department's employees — for every department.

Assumptions:

  • An active salary record has end_date recorded as a missing value.
  • A department's employee_count is the count of employees assigned to it, including departments with no employees (which appear with employee_count of 0).
  • A department's avg_salary is the average of every active salary across its employees, reported as a missing value when no qualifying salary record is on file.
  • The result covers every department.

Output:

  • One row per department.
  • Columns in this order: department_id, department_name, employee_count, avg_salary.
  • Sorted by department_name ascending.
Schema · hr 4 tables
departments
id integer
name text
location text
budget numeric
salaries
id integer
employee_id integer
amount numeric
effective_date date
end_date? date
employees
id integer
name text
email text
department_id integer
manager_id? integer
hire_date date
title text
is_active boolean
job_history
id integer
employee_id integer
title text
department_id integer
start_date date
end_date? date

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
WITH
  dept_headcount AS (
    SELECT
      d.id AS department_id,
      d.name AS department_name,
      COUNT(e.id) AS employee_count
    FROM
      departments d
      LEFT JOIN employees e ON e.department_id = d.id
    GROUP BY
      d.id,
      d.name
  ),
  dept_salary AS (
    SELECT
      e.department_id,
      AVG(s.amount) AS avg_salary
    FROM
      employees e
      LEFT JOIN salaries s ON s.employee_id = e.id
      AND s.end_date IS NULL
    GROUP BY
      e.department_id
  )
SELECT
  dh.department_id,
  dh.department_name,
  dh.employee_count,
  ds.avg_salary
FROM
  dept_headcount dh
  LEFT JOIN dept_salary ds ON ds.department_id = dh.department_id
ORDER BY
  dh.department_name

The shape

Two independent diagnostic measurements in separate CTEs, each preserving the parent layer on its left side, joined together on department_id so the row carries both readings. dept_headcount reports "does this department have employees on staff?" and dept_salary reports "do those employees have active salary records?" When avg_salary comes back missing, the headcount column on the same row identifies which side of the chain produced the NULL.

Clause by clause

  • WITH dept_headcount AS (SELECT d.id AS department_id, d.name AS department_name, COUNT(e.id) AS employee_count FROM departments d LEFT JOIN employees e ON e.department_id = d.id GROUP BY d.id, d.name) counts employees per department, keeping every department through the left join. COUNT(e.id) returns 0 (not NULL) for departments with no employees because the unmatched right side carries a NULL e.id and COUNT(col) skips NULLs while still producing a row for the group.
  • dept_salary AS (SELECT e.department_id, AVG(s.amount) AS avg_salary FROM employees e LEFT JOIN salaries s ON s.employee_id = e.id AND s.end_date IS NULL GROUP BY e.department_id) averages active salaries per department. Pinning s.end_date IS NULL to the join condition rather than a WHERE is load-bearing — it keeps employees with no active salary record in the join (with NULL on the salary side) instead of filtering them out. AVG skips NULLs, so a department where every employee lacks an active record produces NULL.
  • SELECT dh.department_id, dh.department_name, dh.employee_count, das.avg_salary puts the two measurements on a single row.
  • FROM dept_headcount dh LEFT JOIN dept_salary das ON das.department_id = dh.department_id preserves every department from the headcount layer. A department absent from dept_salary (no employees at all, so no employees row to feed the salary CTE) shows up here with avg_salary of NULL.
  • ORDER BY dh.department_name sorts alphabetically. The reference has Data through Support all returning real averages and non-zero headcounts on this particular dataset, but the structure remains the correct diagnostic shape for the cases where it doesn't.

The trap

Move the end_date IS NULL predicate from the join condition to a WHERE clause and the query silently changes shape. WHERE runs after the join, so employees whose only salary row has a non-null end_date get filtered out entirely — and the per-department AVG is now computed over the wrong set. The result still looks like a valid per-department average. It is not. The rule is operational: when filtering the right side of a left join, the filter belongs on the ON clause; placing it in WHERE collapses the left join into an inner join in everything but name, and that collapse is invisible until someone reconciles the count.

You practiced separating two diagnostic measurements into independent layers, so a missing-value average tells you exactly which side of the chain is empty — the empty-departments side or the no-current-salary side.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.