Brightlane's data quality team is auditing the customer registry to understand how many records are missing a city value, broken down by country.
Write a query to return every country, the total number of customers in that country, the count of customers with a missing city, and the count of customers with a recorded city.
Assumptions:
- Each
countryvalue with at least one customer should appear once. - For each country, the total count covers every customer in that country. The missing-
citycount covers only customers whosecityis missing. The with-citycount covers only customers whosecityis recorded. - The missing-
citycount plus the with-citycount equals the total count for each country (the two are mutually exclusive partitions of the total).
Output:
- One row per country, with columns
country,total_customers,missing_city_count, andwith_city_count.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
country,
COUNT(*) AS total_customers,
COUNT(*) FILTER (
WHERE
city IS NULL
) AS missing_city_count,
COUNT(*) FILTER (
WHERE
city IS NOT NULL
) AS with_city_count
FROM
customers
GROUP BY
country The shape
Two complementary FILTER conditions partition each country's customer set into the missing-city subset and the recorded-city subset. COUNT(*) FILTER (WHERE city IS NULL) and COUNT(*) FILTER (WHERE city IS NOT NULL) are mutually exclusive and together cover every row, so their sum equals the unfiltered COUNT(*) on every country row — which is exactly the audit invariant the data quality team needs to confirm.
Clause by clause
SELECT country, COUNT(*) AS total_customers, COUNT(*) FILTER (WHERE city IS NULL) AS missing_city_count, COUNT(*) FILTER (WHERE city IS NOT NULL) AS with_city_countreturns the country, the full customer count, and the two NULL-partitioning counts. The firstFILTERkeeps only the rows whosecityis missing; the second keeps only the rows whosecityis recorded. Every customer in the partition contributes to exactly one of the two filtered counts.FROM customersreads the customer records.GROUP BY countrypartitions the rows per country. The three counts evaluate inside each country's partition independently.
Why IS NULL / IS NOT NULL and not city = NULL / city != NULL
city = NULL does not return true when city is NULL. It returns NULL, which the FILTER treats as not-true, exactly the same way WHERE treats a NULL predicate. The filter would drop every row, the missing-city count would always be zero, and the audit invariant would still appear to hold because both counts would be wrong in matched ways. IS NULL and IS NOT NULL are the only operators that actually test for the presence or absence of a NULL value. They return real booleans, never NULL, so the FILTER knows which rows to keep.
The trap
The two FILTER counts have to sum to the unfiltered COUNT(*) on every row, and they do here because IS NULL and IS NOT NULL are exhaustive — every value is exactly one of the two. Writing the second filter as WHERE city = '' instead of WHERE city IS NOT NULL would silently miss any customer whose city is recorded as a non-empty string, and the audit invariant would break without raising an error. The fix is to use the NULL-aware operators directly, never an equality check, whenever the question is about presence rather than value.
You practiced FILTER (WHERE col IS NULL) and FILTER (WHERE col IS NOT NULL) as complementary counts — two FILTER clauses partitioning a record set into the missing-value and recorded-value subsets, alongside the total.