N060-H1 Tier 5 · Expert · hard ecommerce · Brightlane

Return each `category_id` and the total number of line items associated with `products` in that category, so the analyst can see the actual per-category contribution

Part of Reading EXPLAIN Output in SQL

The problem

Scenario: Brightlane's data analyst ran EXPLAIN on a query that pulls order line items together with their products to compute per-category item counts, and saw the planner estimating 20 total rows at the nested-loop step. Actual execution processed over 100 rows, driving cost much higher than expected. The analyst suspects certain product categories contribute disproportionately.

Task: Write a query to return each category_id and the total number of line items associated with products in that category, so the analyst can see the actual per-category contribution.

Assumptions:

  • One row in the result covers every line item whose product shares the same category_id.

Output:

  • One row per category_id present in the data with at least one line item.
  • Columns in this order: category_id, item_count.
Schema · ecommerce 5 tables
categories
id integer
name text
parent_id? integer
products
id integer
name text
category_id integer
price numeric
stock_qty integer
attributes? jsonb
order_items
id integer
order_id integer
product_id integer
quantity integer
unit_price numeric
customers
id integer
name text
email text
city? text
country text
created_at timestamptz
is_active boolean
orders
id integer
customer_id integer
ordered_at timestamptz
status text
total_amount numeric

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
SELECT
  p.category_id,
  COUNT(oi.id) AS item_count
FROM
  order_items oi
  JOIN products p ON oi.product_id = p.id
GROUP BY
  p.category_id

The shape

The planner estimated 20 rows out of the nested-loop step; actual execution processed over 100, and the per-category breakdown shows why — category 1 alone holds 25 line items, category 5 holds 19, category 6 holds 13. The skew across categories is what the planner was missing. The query joins order_items to products, groups by category_id, and counts.

Clause by clause

  • SELECT p.category_id, COUNT(oi.id) AS item_count returns the category and the number of line items associated with its products. COUNT(oi.id) counts each joined line-item row once.
  • FROM order_items oi reads the line-item records — the larger of the two tables, and the side the nested-loop's outer plan would scan if the planner believes it produces few rows.
  • JOIN products p ON oi.product_id = p.id pairs each line item with its product, bringing category_id into reach. The inner join discards any line item whose product has been removed from the catalog.
  • GROUP BY p.category_id partitions the joined rows by category, so the count runs once per category.

Why this and not just counting order_items directly

A bare COUNT(*) FROM order_items returns the total line-item count without telling the analyst which categories drive it. The planner's 20-row estimate at the nested-loop step is shaped by its model of the join's cardinality, and that model is built from the joint distribution of order_items.product_id and products.category_id. Breaking the count down per category exposes which categories the planner is mismodeling — category 1 with 25 items is doing the disproportionate work. Without the per-category split, the analyst only knows the total is wrong; with it, they know which category's statistic to fix.

The trap

A nested-loop join's cost scales linearly with the row count of its outer side. When the planner estimates 20 rows on a node that actually produces 100+, it picked nested-loop because it expected the inner-side lookups to be cheap and few. At 100+ outer rows the same plan becomes a hash join's natural territory, but the planner is locked into its choice once execution starts. The fix isn't visible in the query — it's in the statistics that produced the underestimate, which is what running ANALYZE on the underlying tables addresses. Reading the actual per-category counts is what tells the analyst whether the issue is one fat category (a histogram problem) or many small ones the planner under-counted across the board (a row-count problem).

You practiced verifying an estimated nested-loop output size against actual row counts — the gap reveals which join cardinality the planner is mismodeling.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.