Scenario: Helix Systems' data analyst ran EXPLAIN on a salary report that needed only current salary records and saw the planner estimating 8 rows for the restriction on end_date. The analyst suspects the estimate is significantly off because the planner often underestimates how many rows have a missing value in a given column.
Task: Write a query to return the actual count of current salary records on file.
Assumptions:
- The
salariestable holds one row per salary record, withend_datestoring when the record was superseded. - A current salary record has
end_daterecorded as a missing value.
Output:
- One row, holding the count of current salary records.
- Columns in this order:
current_salary_count.
Schema · hr 4 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
COUNT(*) AS current_salary_count
FROM
salaries
WHERE
end_date IS NULL The shape
The planner estimated 8 rows for the end_date restriction; the actual count is 60. IS NULL selectivity is one of the categories the planner most often misjudges, because PostgreSQL's statistics track NULL frequency through a single null_frac value rather than a histogram — and when that fraction goes stale, the row-estimate gap can be an order of magnitude.
Clause by clause
SELECT COUNT(*) AS current_salary_countreturns the total count of matching rows.COUNT(*)counts every row that survives the WHERE filter, regardless of which columns on that row are NULL.FROM salariesreads the salary records — one row per salary period for each employee.WHERE end_date IS NULLkeeps only the rows whoseend_dateis missing.IS NULLis the correct test;end_date = NULLevaluates to unknown for every row and would return zero matches even on a table full of current salaries.
Why this and not WHERE end_date = NULL
= doesn't compare against NULL meaningfully. A missing value can't be tested for equality with anything — every column = NULL comparison evaluates to unknown, and WHERE keeps only rows where the condition is true. The result is silently zero, with no error to signal the mistake. The planner's estimate would still come back at 8 because the planner reads null_frac to guess NULL-match selectivity, not the literal predicate text — but the actual count would read 0, and the analyst would conclude the planner was wildly over-estimating instead of under-estimating. IS NULL is the only operator that actually compares against the absence of a value.
The trap
NULL selectivity is the single most common stale-statistics failure mode for HR-style tables where "current" is encoded as missing-end-date. The table has a clear bimodal shape: roughly half the rows (one per employee) have NULL end-dates and the other half have concrete dates — but the planner only knows the null_frac value sampled at the last ANALYZE. If that ran when the company was a third its current size, the absolute count is way off, even though the fraction is the same. The plan node will report rows=8 (or whatever the old sample suggested) and the runtime will process 60 instead. ANALYZE salaries refreshes null_frac against the current table size, and the next plan estimate moves into the right neighborhood. The query itself doesn't need to change; the statistic underneath it does.
You practiced verifying a planner estimate for missing-value matching — IS NULL selectivity is one of the categories the planner most often misjudges.