Scenario: Streamhub's product team is measuring the full three-step conversion funnel from initial engagement through first purchase.
Task: Write a query to return three counts: the number of users who performed at least one 'page_view' (reached_step_1), the number who performed 'upgrade_clicked' strictly after their first 'page_view' (reached_step_2), and the number who performed 'purchase' strictly after their first qualifying 'upgrade_clicked' (reached_step_3).
Assumptions:
- A user reaches step 1 by performing at least one
'page_view'event at any time. - A user reaches step 2 by performing at least one
'upgrade_clicked'event strictly after their earliest'page_view'. - A user reaches step 3 by performing at least one
'purchase'event strictly after their earliest qualifying'upgrade_clicked'. - Every user who reaches step 3 also reaches step 2 and step 1.
Output:
- One row, holding the three step counts.
- Columns in this order:
reached_step_1,reached_step_2,reached_step_3.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
step_1 AS (
SELECT
user_id,
MIN(occurred_at) AS step_1_time
FROM
events
WHERE
event_type = 'page_view'
GROUP BY
user_id
),
upgrade_events AS (
SELECT
user_id,
occurred_at
FROM
events
WHERE
event_type = 'upgrade_clicked'
),
step_2 AS (
SELECT
ue.user_id,
MIN(ue.occurred_at) AS step_2_time
FROM
upgrade_events ue
JOIN step_1 s ON s.user_id = ue.user_id
WHERE
(ue.occurred_at - s.step_1_time) > INTERVAL '0 seconds'
GROUP BY
ue.user_id
),
purchase_events AS (
SELECT
user_id,
occurred_at
FROM
events
WHERE
event_type = 'purchase'
),
step_3 AS (
SELECT
pe.user_id,
MIN(pe.occurred_at) AS step_3_time
FROM
purchase_events pe
JOIN step_2 s ON s.user_id = pe.user_id
WHERE
(pe.occurred_at - s.step_2_time) > INTERVAL '0 seconds'
GROUP BY
pe.user_id
)
SELECT
(
SELECT
COUNT(*)
FROM
step_1
) AS reached_step_1,
(
SELECT
COUNT(*)
FROM
step_2
) AS reached_step_2,
COUNT(*) AS reached_step_3
FROM
step_3 The shape
A three-step funnel is the two-step pattern extended by one more CTE. Each step CTE restricts to the users who reached the previous step and finds their earliest qualifying event after the prior step's timestamp. The final select pulls the three user counts.
Clause by clause
step_1filterseventstoevent_type = 'page_view'and groups byuser_idwithMIN(occurred_at) AS step_1_time. One row per user who ever viewed a page, holding their earliest such timestamp.upgrade_eventspre-filters the event log toevent_type = 'upgrade_clicked'rows. Staging this once keeps the step-2 join narrow.step_2joinsupgrade_eventstostep_1onuser_idand keeps only upgrade clicks where(ue.occurred_at - s.step_1_time) > INTERVAL '0 seconds'— strictly after the user's first page view.MIN(ue.occurred_at)picks each user's earliest qualifying upgrade asstep_2_time.purchase_eventsmirrorsupgrade_events, pre-filtering toevent_type = 'purchase'.step_3joinspurchase_eventstostep_2onuser_idand keeps only purchases where(pe.occurred_at - s.step_2_time) > INTERVAL '0 seconds'— strictly after the user's qualifying upgrade. The result is one row per user who completed all three in order.- The final
SELECTuses scalar subqueries(SELECT COUNT(*) FROM step_1)and(SELECT COUNT(*) FROM step_2)for the upstream counts and a plainCOUNT(*) FROM step_3for the deepest step. Each CTE is already one row per user, so the counts are per-user.
Why this and not joining step_3 back to step_1 directly
You could write step_3 to join purchase_events to step_1 rather than to step_2, with a single condition pe.occurred_at > s.step_1_time. That would count users whose purchase came after their first page view, regardless of whether they ever clicked upgrade. That is a different question. The funnel requires each step to happen after the previous step, not after step 1, so each CTE has to chain off the one immediately before it.
The trap
Counting COUNT(DISTINCT user_id) in the outer query against step_3 looks safe but reads as a hedge against duplicates that the CTE already eliminated. Each step CTE is grouped by user_id, so it produces exactly one row per qualifying user. A plain COUNT(*) is correct, and adding DISTINCT here would only mask a different bug (a missing GROUP BY) if one ever crept in. Keep the count honest to the CTE's grain.
You practiced extending the funnel pattern to three steps — each step CTE narrows to users who completed the previous step and finds their earliest qualifying event after the prior step's timestamp.