N060-H2 Tier 5 · Expert · hard ecommerce · Brightlane

Return each `customer_id` and the combined `total_amount` across all of their `orders`, so the analyst can see the actual group count and revenue distribution

Part of Reading EXPLAIN Output in SQL

The problem

Scenario: Brightlane's data analyst ran EXPLAIN on a per-customer revenue rollup that pulls orders together with customers, and saw the per-customer grouping step estimating only 5 customer groups. Execution was significantly slower than expected, suggesting the real customer count is much higher.

Task: Write a query to return each customer_id and the combined total_amount across all of their orders, so the analyst can see the actual group count and revenue distribution.

Assumptions:

  • A customer's total_revenue is the combined total_amount across all of their orders.
  • The result covers only customers who have placed at least one order.

Output:

  • One row per customer with at least one order on record.
  • Columns in this order: customer_id, total_revenue.
Schema · ecommerce 5 tables
categories
id integer
name text
parent_id? integer
products
id integer
name text
category_id integer
price numeric
stock_qty integer
attributes? jsonb
order_items
id integer
order_id integer
product_id integer
quantity integer
unit_price numeric
customers
id integer
name text
email text
city? text
country text
created_at timestamptz
is_active boolean
orders
id integer
customer_id integer
ordered_at timestamptz
status text
total_amount numeric

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
SELECT
  o.customer_id,
  SUM(o.total_amount) AS total_revenue
FROM
  orders o
  JOIN customers c ON o.customer_id = c.id
GROUP BY
  o.customer_id

The shape

The planner expected 5 customer groups at the per-customer grouping step; the real count is 62 — over an order of magnitude off. The query joins orders to customers, groups by customer_id, and sums total_amount per customer. The grouped result shape mirrors what the planner was estimating, so the actual group count lines up against the planner's 5-row estimate directly.

Clause by clause

  • SELECT o.customer_id, SUM(o.total_amount) AS total_revenue returns each customer's ID and their summed order total. SUM adds the total_amount of every order in that customer's group.
  • FROM orders o reads the order records — the side carrying both the customer reference and the amount being summed.
  • JOIN customers c ON o.customer_id = c.id matches each order to its customer. The join doesn't bring new columns into the SELECT, but it does restrict the result to orders whose customer still exists in the customers table — which is the prompt's "customers who have placed at least one order" constraint, read from the orders side.
  • GROUP BY o.customer_id partitions the joined rows by customer. Each output row is one customer's full order history rolled up to a single number.

Why this and not grouping on c.id

o.customer_id and c.id are equal on every joined row (that's the join condition), so grouping on either column produces the same groups and the same sums. Choosing the orders side keeps the planner's grouping work on the larger table's column, which mirrors the shape EXPLAIN was estimating — that's the surface the 5-group estimate was attached to. The two are interchangeable on this query; the choice is about which side reads more naturally as "one customer per group."

The trap

The planner's group-count estimate comes from the distinct-value statistic on the grouped column. When that statistic reports 5 but the table actually carries 62, the planner picks a hash-aggregate sized for 5 buckets and the runtime rehashes as the 6th, 7th, ... 62nd customer appears. The cost shows up as elevated actual time on the aggregate node, even though the row counts at the scan level may look fine. The misestimate isn't in the join cardinality — it's in the post-join distinct-value count, which depends on a statistic the planner reads off the column being grouped. ANALYZE on the underlying table refreshes that statistic; until then, every group-by on this column gets the same wrong allocation.

You practiced computing the real per-group count and per-group totals to compare against EXPLAIN's grouping-step estimate — the discrepancy at that step drives statistics refresh decisions.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.