Hierarchy Levels Above a Size — Recursive CTEs in SQL

The problem

Helix Systems' workforce analytics team wants to identify which depth levels in the organizational hierarchy carry significant headcount — depth levels with more than 5 employees.

Write a query to return every qualifying depth level and its employee count.

Assumptions:

The CEO has a missing manager_id and sits at depth 1. Employees one step below the CEO sit at depth 2, and so on.
The headcount at a depth level is the number of employees whose computed depth equals that level.
Only depth levels whose headcount is strictly greater than 5 should appear in the result.

Output:

One row per qualifying depth level, with columns depth and employee_count.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH RECURSIVE
  org AS (
    SELECT
      id,
      1 AS depth
    FROM
      employees
    WHERE
      manager_id IS NULL
    UNION ALL
    SELECT
      e.id,
      o.depth + 1
    FROM
      employees e
      JOIN org o ON e.manager_id = o.id
  ),
  level_counts AS (
    SELECT
      depth,
      COUNT(id) AS employee_count
    FROM
      org
    GROUP BY
      depth
  )
SELECT
  depth,
  employee_count
FROM
  level_counts
WHERE
  employee_count > 5

The shape

Three stages, two in a chained WITH clause and one in the main query. The first CTE walks the hierarchy and stamps each employee with their depth. The second CTE aggregates that flat result into one row per depth level. The main query then applies the > 5 threshold against the per-level count.

Clause by clause

The recursive CTE computes per-employee depth:

WITH RECURSIVE org AS (
    SELECT id, 1 AS depth
    FROM employees
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, o.depth + 1
    FROM employees e
    JOIN org o ON e.manager_id = o.id
)

Same top-down walk as the unbounded hierarchy traversal: anchor on the CEO at depth 1, recurse downward by matching each employee's manager_id to a row already in org, stamp each pass with one more than the source depth.

The chained CTE aggregates the recursion's output by level:

level_counts AS (
    SELECT depth, COUNT(id) AS employee_count
    FROM org
    GROUP BY depth
)

The second CTE reads org and collapses it into one row per depth value, with COUNT(id) returning the headcount at that level. Depth 1 shows 1, depth 2 shows 9, depth 3 shows 19, depth 4 shows 31.

The main query thresholds the per-level counts:

SELECT depth, employee_count
FROM level_counts
WHERE employee_count > 5

The filter employee_count > 5 runs against the named aggregate column. Depth 1 falls below the threshold and drops out; depths 2, 3, and 4 survive.

Why three stages and not two

The aggregation cannot live inside the recursive member — an aggregate would collapse rows the next pass needs to join against, and the recursion would break. The threshold cannot live in the level_counts CTE either, because that CTE's WHERE runs over the pre-aggregation rows in org, not over the aggregate employee_count. Each layer has one job. The recursion produces per-employee depth, the chained CTE produces per-level counts, and the main query applies the cutoff against the aggregate column. The three-stage shape is the standard way to layer "recurse, then aggregate, then threshold."

The trap

The threshold runs on the aggregate, not on individual employees. WHERE employee_count > 5 in the main query is comparing against the named column produced by COUNT(id) in the chained CTE. The named aggregate only exists once level_counts finishes computing; pushing the same filter into level_counts's WHERE would not work, because that WHERE runs over the pre-aggregation rows in org, where employee_count is not yet a column. The filter has to sit downstream of the aggregation, which is what the three-stage layering makes structurally explicit.

You practiced chaining a WITH RECURSIVE traversal into a follow-up CTE — recurse first to compute per-record depth, aggregate by depth in a second layer, then threshold-check in the main query.

Return every qualifying depth level and its employee count

The shape

Clause by clause

Why three stages and not two

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.