Departments with at Least Three — CTEs in SQL

The problem

The HR team at Helix Systems is auditing team size across the organization.

Write a query to return the department ID and employee count for every department that has 3 or more employees on record.

Assumptions:

The employees table has one row per employee with a department_id.
A department's employee count is the number of employees records linked to that department_id.
Only departments with 3 or more employees should appear.

Output:

One row per qualifying department, with columns department_id and headcount.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  dept_headcount AS (
    SELECT
      department_id,
      COUNT(*) AS headcount
    FROM
      employees
    GROUP BY
      department_id
  )
SELECT
  department_id,
  headcount
FROM
  dept_headcount
WHERE
  headcount >= 3

The shape

The WITH layer computes a headcount per department; the main query filters that named result with WHERE headcount >= 3 against the aggregate column. The threshold check happens after the grouping, in a separate named layer.

Clause by clause

The WITH clause defines dept_headcount:

WITH dept_headcount AS (
  SELECT department_id, COUNT(*) AS headcount
  FROM employees
  GROUP BY department_id
)

GROUP BY department_id partitions employees by department; COUNT(*) counts each partition. Every department with at least one employee gets a row in the layer, headcount attached.

SELECT department_id, headcount FROM dept_headcount WHERE headcount >= 3 is the main query. It reads the named layer and keeps the rows whose count is 3 or more. Department 1 stays in with 17; department 3 stays in with 11; the other six departments also clear the threshold on this data.

Why this and not a derived table in `FROM`

A derived table would put the per-department aggregation inside the main query's FROM and apply the same threshold in the same WHERE. The result set is identical either way. The WITH version pulls the aggregation out, names it, and lets the main query read as two clear steps: compute the per-department count, then keep the ones at the cutoff. The threshold against an aggregate is the recurring shape that named layers were built to make legible.

The trap

COUNT(*) only produces headcount after the grouping runs. The named column does not exist on employees itself, only on the layer's output. Trying to write WHERE COUNT(*) >= 3 directly against employees without the layer fails, because the aggregate has not been computed yet. The two-stage shape — group first, filter the named aggregate second — is the structural answer to that ordering problem.

You practiced computing a per-category count in a WITH layer and applying a threshold check in the main query.

Return the department ID and employee count for every department that has `3` or more employees on record

The shape

Clause by clause

Why this and not a derived table in `FROM`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not a derived table in FROM

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not a derived table in `FROM`