Scenario: Brightlane's data analyst ran EXPLAIN on a query that pulls shipped orders together with their customers and saw the planner estimating only 10 rows after the 'shipped' restriction. Actual execution was much slower, suggesting the estimate was wrong.
Task: Write a query to return the actual count of shipped orders represented across the customer base.
Assumptions:
- A shipped order has
statusequal to'shipped'. - Every order corresponds to exactly one customer.
Output:
- One row, holding the shipped-order count.
- Columns in this order:
shipped_joined_count.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
COUNT(*) AS shipped_joined_count
FROM
orders o
JOIN customers c ON o.customer_id = c.id
WHERE
o.status = 'shipped' The shape
The planner estimated 10 rows after the 'shipped' restriction, but the join-plus-filter actually produces 17. The same shape that EXPLAIN planned — read orders, join customers, filter on shipped — counted directly gives the actual number the planner was trying to estimate.
Clause by clause
SELECT COUNT(*) AS shipped_joined_countreturns the total row count produced by the joined-and-filtered row stream. Each surviving row contributes one to the count.FROM orders oreads the order records.JOIN customers c ON o.customer_id = c.idmatches each order to its customer. Every order has exactly one customer (the prompt's assumption), so the join produces exactly one output row per order — the join doesn't multiply the row count.WHERE o.status = 'shipped'keeps only the rows whose underlying order is shipped. This is the predicate whose selectivity the planner was estimating.
Why this and not just counting orders directly
Without the join, the count of shipped orders comes from orders alone — and that's what N060-E1 does. The reason to keep the join here is that the planner's 10-row estimate was for the post-join row count, not the pre-join one. If the analyst measures shipped orders without the join, they're answering a slightly different question than the plan was estimating. With one customer per order, the two numbers happen to match — but the discipline of using the same shape as the plan is what keeps the comparison honest. On a join that does fan out rows (one-to-many), the difference would be load-bearing.
You practiced verifying a join's post-restriction row count by running the same shape and counting — the actual number measures the planner's selectivity estimate.