N066-M3 Tier 5 · Expert · medium analytics · Streamhub

Return each user's `user_id` and `purchase_count` — the count of `'purchase'` events recorded for that user

Part of Analyst Debugging Patterns in SQL

The problem

Scenario: Streamhub's analytics pipeline tracking per-user conversion is reporting unexpectedly low purchase counts for some users. The analyst inspects the purchase-event layer in isolation to verify the data is complete before checking how it composes into the downstream rollup.

Task: Write a query to return each user's user_id and purchase_count — the count of 'purchase' events recorded for that user.

Assumptions:

  • A purchase event has event_type equal to 'purchase'.
  • A user's purchase_count is the count of 'purchase' events recorded for them.
  • The result covers only users with at least one recorded purchase event.

Output:

  • One row per qualifying user.
  • Columns in this order: user_id, purchase_count.
  • Sorted by purchase_count descending.
Schema · analytics 5 tables
users
id integer
name text
email text
country text
plan text
signed_up_at timestamptz
is_active boolean
conversions
id integer
user_id integer
converted_at timestamptz
plan text
amount numeric
sessions
id integer
user_id integer
started_at timestamptz
ended_at? timestamptz
event_count integer
events
id integer
user_id integer
session_id? integer
event_type text
occurred_at timestamptz
properties? jsonb
periods
id integer
name text
start_month integer
end_month integer

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
SELECT
  user_id,
  COUNT(*) AS purchase_count
FROM
  events
WHERE
  event_type = 'purchase'
GROUP BY
  user_id
ORDER BY
  purchase_count DESC

The shape

Filter events down to purchase rows, group by user, and count. Isolating this single layer is the diagnostic move — running it on its own, before checking how it composes into a larger pipeline, verifies the source data is what the downstream rollup is reading.

Clause by clause

  • SELECT user_id, COUNT(*) AS purchase_count returns one row per user with their purchase-event count.
  • FROM events reads the raw event log without any further composition. This is what makes the query a diagnostic: it touches the source layer directly, before any downstream CTE has a chance to compress or transform the data.
  • WHERE event_type = 'purchase' restricts the rows to purchase events before the grouping happens. The filter runs before the aggregation, so every counted row is already a purchase — the count is "purchases per user," not "events per user filtered down."
  • GROUP BY user_id collapses the filtered rows into one group per user. COUNT(*) then resolves per group.
  • ORDER BY purchase_count DESC puts the highest-purchase users first. The reference shows nineteen users with exactly one purchase each, which is itself the diagnostic finding — the unexpectedly-low conversion downstream isn't a bug in the rollup, it's a property of the source layer. Every active purchaser in this dataset has exactly one purchase on file.

Why this and not joining straight to users

The prompt is about verifying the purchase-event layer in isolation, not about producing a final conversion report. Joining users in this query would compose two layers at once and reintroduce exactly the kind of opacity the diagnostic is trying to avoid. The next step (after this one returns the expected shape) is to join the counts back to users and check the downstream rollup — but that's the next step, not this one.

You practiced isolating a single intermediate layer of a downstream pipeline and inspecting it directly — the per-user purchase count by itself, before checking how it composes with later layers.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.