Customer Revenue with Safe Aggregation — Debugging Queries in SQL

The problem

Scenario: Brightlane's customer revenue report has two observed problems at once: some customer totals are several times the expected value (suggesting child-row multiplication is inflating the parent total), and some customers report a missing-value revenue even though their orders are on record. The diagnostic must produce per-customer totals where each line item contributes exactly once and customers with no orders are still preserved.

Task: Write a query to return each customer's id, name, and total_revenue — the combined line-item value across their orders, with each line item contributing its value exactly once to the customer's total, reported as a missing value for customers with no orders on record.

Assumptions:

An order may have multiple line items.
A line item's value is quantity multiplied by unit_price.
A customer's total_revenue is the combined line-item value across every order they have placed, with each line item counted exactly once. Customers with no orders on record appear with total_revenue reported as a missing value.
The result covers every customer.

Output:

One row per customer.
Columns in this order: customer_id, customer_name, total_revenue.
Sorted by customer_name ascending.

Schema · ecommerce 5 tables

The shape

Two CTEs in sequence. The first collapses line items to one row per order, defusing the order-to-line-item fanout at its source. The second sums those per-order totals up to one row per customer. Then a final left join from customers to the customer-revenue summary preserves customers with no orders. The architecture fixes the inflated totals and the dropped customers in a single pass — each layer addresses one of the two observed problems.

Clause by clause

WITH order_item_totals AS (SELECT order_id, SUM(quantity * unit_price) AS item_total FROM order_items GROUP BY order_id) collapses every order's line items into one summary row per order. This is the fanout fix. Before this layer, joining orders directly to order_items would have produced multiple rows per order, and summing any order-level column over that result would inflate by the per-order line-item count.
customer_revenue AS (SELECT o.customer_id, SUM(oit.item_total) AS total_revenue FROM orders o JOIN order_item_totals oit ON oit.order_id = o.id GROUP BY o.customer_id) rolls the per-order totals up to per-customer totals. The join is inner because every order in orders has a matching row in order_item_totals (the CTE was built from the same key space), and customers with no orders don't appear in this CTE at all — they'll be reintroduced by the final left join.
SELECT c.id AS customer_id, c.name AS customer_name, cr.total_revenue returns the customer-level columns plus the revenue figure.
FROM customers c LEFT JOIN customer_revenue cr ON cr.customer_id = c.id keeps every customer. Customers absent from customer_revenue (because they had no orders) get NULL for total_revenue, which is the second of the two diagnostic signals.
ORDER BY c.name sorts alphabetically. The reference shows Cole Wood, Dan Carter, Eva Martinez, and others surfacing with total_revenue of null — the preserved no-order customers — while customers like Henry Brown show 3085.96, the de-fanned correct total.

The trap

Combining the two fixes into one query is where this problem bites. A naive shape — customers LEFT JOIN orders LEFT JOIN order_items with SUM(quantity * unit_price) grouped by customer — looks like it should work and reads as correct. It does not. The chain produces one row per (customer, order, line-item) triple. SUM(quantity * unit_price) over that result is right per group, but if the analyst had needed to sum any order-level column (say o.total_amount) the same chain would multiply that value by the per-order line-item count and inflate it silently. The discipline is to collapse the many side in a CTE before the next join sees it. Each one-to-many relationship in a chain has to be aggregated to one row per parent before the next layer composes onto it; otherwise the next aggregation runs over a multiplied set and the error compounds upward without a visible signal.

You practiced a fanout-safe rollup — collapsing the many side per order before pairing up to the customer, while keeping the left-join chain intact so customers with no orders still appear with a missing-value revenue.

Return each customer's `id`, name, and `total_revenue` — the combined line-item value across their `orders`, with each line item contributing its value exactly once to the customer's total, reported as a missing value for customers with no orders on record

The shape

Clause by clause

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.