N059 Tier 5 · Expert

Join Fanout and Aggregate Correctness in SQL

Join fanout is a correctness problem that occurs when a join multiplies rows beyond what the query intends, causing subsequent aggregations to compute over inflated row counts and produce wrong results. The query runs without error, returns plausible-looking numbers, and gives no indication anything is wrong.

Join fanout is what happens when a join multiplies rows beyond what you intended, causing every aggregation that follows to compute over inflated data. The query runs without error, returns plausible-looking numbers, and gives no indication anything is wrong.

Here's the mechanism. You have an orders table and an order_items table, where each order has multiple line items. When you join them on order_id, you get one row per line item — not per order. An order with five items appears five times in the joined result. If you then sum orders.revenue (an order-level column), that revenue figure gets counted five times for the five-item order. The sum is five times too large, and nothing in the query or output flags the problem.

-- Fanout inflating revenue
SELECT
    o.customer_id,
    SUM(o.order_revenue) AS total_revenue  -- wrong: counted once per line item
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY o.customer_id

Customers with more line items appear to have disproportionately higher revenue. The error is invisible without comparing the output to a known-correct total.

Diagnosing fanout

Before aggregating, check the row count at the join level:

SELECT COUNT(*) FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id

If this exceeds SELECT COUNT(*) FROM orders, the join is fanning out. That's expected behavior for a one-to-many join — the problem is aggregating an order-level column over the multiplied rows.

Make this check a habit: verify the join row count before writing the aggregation. It's faster to catch fanout here than to trace wrong numbers back through the query after the fact.

The fix: pre-aggregate before joining

Collapse the right-side table to one row per key before the join runs:

WITH item_totals AS (
    SELECT order_id, SUM(item_revenue) AS total_item_revenue
    FROM order_items
    GROUP BY order_id
)
SELECT
    o.customer_id,
    SUM(o.order_revenue) AS total_revenue
FROM orders o
JOIN item_totals it ON it.order_id = o.order_id
GROUP BY o.customer_id

Now the join is one-to-one on order_id. No fanout. The aggregation is correct.

The one thing that trips people up

Fanout compounds across multiple joins. If you join orders to both order_items (5 items) and order_shipments (2 shipments), you get 10 rows per order — one per item-shipment combination. Aggregating any order-level column inflates it by a factor of 10.

The precondition for avoiding fanout is knowing the cardinality of every join before writing it: for each join key, can the right side have more than one row per key value? If yes, aggregating anything from the left side over that join is dangerous unless the right side is pre-aggregated first.

Practice

10 Join Fanout and Aggregate Correctness practice problems

These problems are part of the Join Fanout and Aggregate Correctness lesson in SQLMaxx, with instant grading and a worked solution on each.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.