Count All Customers vs Those with a City — Aggregate Functions in SQL

The problem

Brightlane's data team is auditing the customers table. They want the total number of customer records alongside the number of customers who have a city on file — in a single row, so the gap between the two figures is immediately visible.

Write a query to return both counts in one row.

Assumptions:

The customers table contains every customer Brightlane has on file.
Some customers have a recorded city value; many have a missing city.
Both figures come from the same set of customers; neither figure excludes the missing-city customers up front.

Output:

A single row with two columns, total_customers and customers_with_city. The two values will not be equal — the gap reflects how many customers are missing a city.

Schema · ecommerce 5 tables

The shape

COUNT(*) counts every row in customers. COUNT(city) counts only rows where city is not NULL. The difference between the two numbers — 70 minus 61, so 9 here — is exactly the number of customer records missing a city value. The two aggregates side by side make the data-quality gap immediately readable.

Clause by clause

FROM customers is the source set: every customer Brightlane has on file, regardless of how complete each record is.
COUNT(*) AS total_customers counts rows. The * form ignores every column's value, NULL or otherwise, and just tallies row existence. The result is the table's row count: 70.
COUNT(city) AS customers_with_city counts non-NULL values of the city column. PostgreSQL walks the table the same way, but this time it only adds to the running tally when the city value is something real. Rows with city IS NULL are skipped from the count. The result is 61.
The comma between the two aggregates puts both numbers in the same output row. A single pass over customers produces both counts simultaneously, which means they describe the same snapshot of the table.

Why this and not two separate queries

The total_customers - customers_with_city difference is the geocode-coverage gap, computable directly from the two numbers in the same result row. The data team gets the gap without running a third query, and the two counts are guaranteed to come from the same point in time — there's no concurrent customer creation landing between two passes.

The trap

The trap is confusing COUNT(*) and COUNT(col) and using them interchangeably. They are different aggregates with different inputs. COUNT(*) always returns the row count; COUNT(col) returns the count of non-NULL values in col. If every row has a value for col, the two numbers happen to match — but that's a coincidence of the data, and the two counts are still measuring different things. The moment a single NULL appears in col, the numbers diverge.

The way to keep them straight: COUNT(*) is for "how many rows are there?" COUNT(col) is for "how many rows have data in this column?" The audit question here is the second one, and the gap between the two answers is the story the team came to read.

You practiced contrasting COUNT(*) with COUNT(col) in a single row. The gap between them is the data-quality story for that column.

Return both counts in one row

The shape

Clause by clause

Why this and not two separate queries

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.