Brightlane's catalogue team is auditing category utilisation and needs to identify any categories with no products assigned to them.
Write a query to return the category name for every empty category.
Assumptions:
- An empty category is one whose
iddoes not appear inproducts.category_id. - The result will contain a small number of rows — most categories have products.
Output:
- One row per empty category, with a single column
category_name.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
cat.name AS category_name
FROM
categories cat
LEFT JOIN products p ON cat.id = p.category_id
WHERE
p.id IS NULL The shape
categories is the table to enumerate; products is where the existence check happens. A LEFT JOIN from categories to products keeps every category, and WHERE p.id IS NULL keeps only the two whose product columns came back as NULL — Clothing and Home & Garden. The same anti-join shape, anchored on the dimension side rather than a fact side.
Clause by clause
SELECT cat.name AS category_namereturns just the category name from the left side — the catalogue team's single-column audit report.FROM categories cat LEFT JOIN products p ON cat.id = p.category_idpairs each category with each of its products. Categories with products produce one row per product with real values inp.*. Empty categories produce a single row withNULLin everyp.*column.WHERE p.id IS NULLkeeps only the unmatched categories.products.idis the primary key on the right side — it's neverNULLon a real row. So aNULLinp.idis unambiguous: the row was synthesised by the outer join to preserve a category that no product points to.
Why this and not categories RIGHT JOIN products
The earlier RIGHT JOIN example in this node preserved categories by putting them on the right; that worked because table position determines preservation. Here the conventional shape — preserve the left table with LEFT JOIN — is cleaner, and most teams standardise on it. Putting categories first in the FROM clause also makes the row set immediately readable: the eye lands on the table being enumerated, exactly where the audit logic starts.
The two queries produce identical results when the table order is flipped and the join keyword is changed. The choice between them is purely a readability convention.
The trap
The trap with dimension-side anti-joins is checking IS NULL on the join key (p.category_id) instead of the primary key (p.id). They happen to work the same way here because every right-side column is NULL for unmatched rows. But category_id is the column being matched on, and if the schema ever allows a real product to have a NULL category_id (an uncategorised product), the filter conflates two different facts: "the category has no products" and "the category has products but at least one is mis-tagged." Checking the primary key (p.id) cleanly isolates the first signal — the row was synthesised by the outer join, nothing more. That's the column to filter on by default.
The broader rule of thumb: anchor the LEFT JOIN on the table being enumerated, then check IS NULL on the other table's primary key. The signal stays clean.
You practiced the anti-join from the dimension side rather than the fact side. The recurring rule of thumb: anchor your LEFT JOIN on whichever table you want to enumerate, then filter for IS NULL on a key from the other side — that key being NULL is the signal that no match exists.