Count Current Salary Records — Reading EXPLAIN Output in SQL

The problem

Scenario: Helix Systems' data analyst ran EXPLAIN on a salary report that needed only current salary records and saw the planner estimating 8 rows for the restriction on end_date. The analyst suspects the estimate is significantly off because the planner often underestimates how many rows have a missing value in a given column.

Task: Write a query to return the actual count of current salary records on file.

Assumptions:

The salaries table holds one row per salary record, with end_date storing when the record was superseded.
A current salary record has end_date recorded as a missing value.

Output:

One row, holding the count of current salary records.
Columns in this order: current_salary_count.

Schema · hr 4 tables

departments

id integer

name text

location text

budget numeric

salaries

id integer

employee_id integer

amount numeric

effective_date date

end_date? date

employees

id integer

name text

email text

department_id integer

manager_id? integer

hire_date date

title text

is_active boolean

job_history

id integer

employee_id integer

title text

department_id integer

start_date date

end_date? date

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

SELECT
  COUNT(*) AS current_salary_count
FROM
  salaries
WHERE
  end_date IS NULL

The shape

The planner estimated 8 rows for the end_date restriction; the actual count is 60. IS NULL selectivity is one of the categories the planner most often misjudges, because PostgreSQL's statistics track NULL frequency through a single null_frac value rather than a histogram — and when that fraction goes stale, the row-estimate gap can be an order of magnitude.

Clause by clause

SELECT COUNT(*) AS current_salary_count returns the total count of matching rows. COUNT(*) counts every row that survives the WHERE filter, regardless of which columns on that row are NULL.
FROM salaries reads the salary records — one row per salary period for each employee.
WHERE end_date IS NULL keeps only the rows whose end_date is missing. IS NULL is the correct test; end_date = NULL evaluates to unknown for every row and would return zero matches even on a table full of current salaries.

Why this and not `WHERE end_date = NULL`

= doesn't compare against NULL meaningfully. A missing value can't be tested for equality with anything — every column = NULL comparison evaluates to unknown, and WHERE keeps only rows where the condition is true. The result is silently zero, with no error to signal the mistake. The planner's estimate would still come back at 8 because the planner reads null_frac to guess NULL-match selectivity, not the literal predicate text — but the actual count would read 0, and the analyst would conclude the planner was wildly over-estimating instead of under-estimating. IS NULL is the only operator that actually compares against the absence of a value.

The trap

NULL selectivity is the single most common stale-statistics failure mode for HR-style tables where "current" is encoded as missing-end-date. The table has a clear bimodal shape: roughly half the rows (one per employee) have NULL end-dates and the other half have concrete dates — but the planner only knows the null_frac value sampled at the last ANALYZE. If that ran when the company was a third its current size, the absolute count is way off, even though the fraction is the same. The plan node will report rows=8 (or whatever the old sample suggested) and the runtime will process 60 instead. ANALYZE salaries refreshes null_frac against the current table size, and the next plan estimate moves into the right neighborhood. The query itself doesn't need to change; the statistic underneath it does.

You practiced verifying a planner estimate for missing-value matching — IS NULL selectivity is one of the categories the planner most often misjudges.

Return the actual count of current salary records on file

The shape

Clause by clause

Why this and not `WHERE end_date = NULL`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not WHERE end_date = NULL

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not `WHERE end_date = NULL`