Scenario: Streamhub's analytics pipeline tracking per-user conversion is reporting unexpectedly low purchase counts for some users. The analyst inspects the purchase-event layer in isolation to verify the data is complete before checking how it composes into the downstream rollup.
Task: Write a query to return each user's user_id and purchase_count — the count of 'purchase' events recorded for that user.
Assumptions:
- A purchase event has
event_typeequal to'purchase'. - A user's
purchase_countis the count of'purchase'events recorded for them. - The result covers only users with at least one recorded purchase event.
Output:
- One row per qualifying user.
- Columns in this order:
user_id,purchase_count. - Sorted by
purchase_countdescending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
user_id,
COUNT(*) AS purchase_count
FROM
events
WHERE
event_type = 'purchase'
GROUP BY
user_id
ORDER BY
purchase_count DESC The shape
Filter events down to purchase rows, group by user, and count. Isolating this single layer is the diagnostic move — running it on its own, before checking how it composes into a larger pipeline, verifies the source data is what the downstream rollup is reading.
Clause by clause
SELECT user_id, COUNT(*) AS purchase_countreturns one row per user with their purchase-event count.FROM eventsreads the raw event log without any further composition. This is what makes the query a diagnostic: it touches the source layer directly, before any downstream CTE has a chance to compress or transform the data.WHERE event_type = 'purchase'restricts the rows to purchase events before the grouping happens. The filter runs before the aggregation, so every counted row is already a purchase — the count is "purchases per user," not "events per user filtered down."GROUP BY user_idcollapses the filtered rows into one group per user.COUNT(*)then resolves per group.ORDER BY purchase_count DESCputs the highest-purchase users first. The reference shows nineteen users with exactly one purchase each, which is itself the diagnostic finding — the unexpectedly-low conversion downstream isn't a bug in the rollup, it's a property of the source layer. Every active purchaser in this dataset has exactly one purchase on file.
Why this and not joining straight to users
The prompt is about verifying the purchase-event layer in isolation, not about producing a final conversion report. Joining users in this query would compose two layers at once and reintroduce exactly the kind of opacity the diagnostic is trying to avoid. The next step (after this one returns the expected shape) is to join the counts back to users and check the downstream rollup — but that's the next step, not this one.
You practiced isolating a single intermediate layer of a downstream pipeline and inspecting it directly — the per-user purchase count by itself, before checking how it composes with later layers.