Scenario: Streamhub's retention team is analyzing user conversion behavior and needs every user paired with their total conversion spend. Users who have never converted must appear with total_conversion_spend reported as a missing value — the distinction between 'no conversions' and 'zero spend' is analytically meaningful.
Task: Write a query to return each user's id and their total_conversion_spend — the combined amount across all of their conversions, reported as a missing value for users with no conversions on record.
Assumptions:
- A user's
total_conversion_spendis the combinedamountacross all of theirconversions. - The result covers every user.
- A user with no
conversionson record appears withtotal_conversion_spendreported as a missing value rather than0.
Output:
- One row per user.
- Columns in this order:
user_id,total_conversion_spend. - Sorted by
user_idascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
u.id AS user_id,
SUM(c.amount) AS total_conversion_spend
FROM
users u
LEFT JOIN conversions c ON c.user_id = u.id
GROUP BY
u.id
ORDER BY
u.id The shape
The LEFT JOIN keeps every user in the result, and SUM(c.amount) returns NULL — not zero — for users with no conversions on record. The retention team specifically wants that NULL because "no conversions" is analytically distinct from "zero spend"; no COALESCE wrapper is the right call here, and that is the load-bearing decision.
Clause by clause
SELECT u.id AS user_id, SUM(c.amount) AS total_conversion_spendreturns one row per user with their combined conversion spend. The aggregate runs over whateverconversionsrows matched the join. For a user with at least one conversion,SUMreturns the populated total; for a user with no conversions, the group contains one placeholder row withc.amountset to NULL, andSUMover a single NULL input returns NULL.FROM users u LEFT JOIN conversions c ON c.user_id = u.idpairs each user with their conversions. TheLEFT JOINpreserves users who have none; for those users, everyconversionscolumn is NULL.GROUP BY u.idcollapses the per-conversion rows back to one row per user so the aggregate produces a per-user total.ORDER BY u.idreturns the report sorted by user id.
The trap
The reflex on a LEFT JOIN plus an aggregate is to reach for COALESCE(SUM(c.amount), 0). That reflex is correct on every other problem in this node; on this one it is the bug. Wrapping SUM with COALESCE(..., 0) would convert the NULL for non-converting users into 0, which the retention team would then average alongside the populated spends. A user who converted for $0 and a user who never converted at all would then be indistinguishable in the report, and the population mean would shift toward zero by however many non-converters exist. The fix is to read the assumptions and the output spec carefully and let NULL stand. The grader on this problem accepts NULL in total_conversion_spend for non-converters — substituting 0 would fail it. The shape of "preserve the missing-value signal" is sometimes the answer, and sometimes the rest of the report depends on you noticing that.
You practiced preserving the missing-value total that a left-join produces for empty parents — refusing to substitute 0 so the analytical distinction between 'no record' and 'zero' stays intact.