Brightlane's data team is auditing the customers table. They want the total number of customer records alongside the number of customers who have a city on file — in a single row, so the gap between the two figures is immediately visible.
Write a query to return both counts in one row.
Assumptions:
- The
customerstable contains every customer Brightlane has on file. - Some customers have a recorded
cityvalue; many have a missingcity. - Both figures come from the same set of customers; neither figure excludes the missing-city customers up front.
Output:
- A single row with two columns,
total_customersandcustomers_with_city. The two values will not be equal — the gap reflects how many customers are missing a city.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
COUNT(*) AS total_customers,
COUNT(city) AS customers_with_city
FROM
customers The shape
COUNT(*) counts every row in customers. COUNT(city) counts only rows where city is not NULL. The difference between the two numbers — 70 minus 61, so 9 here — is exactly the number of customer records missing a city value. The two aggregates side by side make the data-quality gap immediately readable.
Clause by clause
FROM customersis the source set: every customer Brightlane has on file, regardless of how complete each record is.COUNT(*) AS total_customerscounts rows. The*form ignores every column's value,NULLor otherwise, and just tallies row existence. The result is the table's row count:70.COUNT(city) AS customers_with_citycounts non-NULLvalues of thecitycolumn. PostgreSQL walks the table the same way, but this time it only adds to the running tally when thecityvalue is something real. Rows withcity IS NULLare skipped from the count. The result is61.- The comma between the two aggregates puts both numbers in the same output row. A single pass over
customersproduces both counts simultaneously, which means they describe the same snapshot of the table.
Why this and not two separate queries
The total_customers - customers_with_city difference is the geocode-coverage gap, computable directly from the two numbers in the same result row. The data team gets the gap without running a third query, and the two counts are guaranteed to come from the same point in time — there's no concurrent customer creation landing between two passes.
The trap
The trap is confusing COUNT(*) and COUNT(col) and using them interchangeably. They are different aggregates with different inputs. COUNT(*) always returns the row count; COUNT(col) returns the count of non-NULL values in col. If every row has a value for col, the two numbers happen to match — but that's a coincidence of the data, and the two counts are still measuring different things. The moment a single NULL appears in col, the numbers diverge.
The way to keep them straight: COUNT(*) is for "how many rows are there?" COUNT(col) is for "how many rows have data in this column?" The audit question here is the second one, and the gap between the two answers is the story the team came to read.
You practiced contrasting COUNT(*) with COUNT(col) in a single row. The gap between them is the data-quality story for that column.