Streamhub's product team wants to know which users have had more than one high-intensity session — a session with at least 5 events.
Write a query to return each qualifying user's ID and their count of high-intensity sessions.
Assumptions:
- A high-intensity session has an
event_countof5or more. - A user's qualifying-session count is the number of high-intensity sessions linked to that
user_id. - Only users with more than
1qualifying session should appear.
Output:
- One row per qualifying user, with columns
user_idandqualifying_sessions.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
high_event_sessions AS (
SELECT
user_id,
event_count
FROM
sessions
WHERE
event_count >= 5
),
user_counts AS (
SELECT
user_id,
COUNT(*) AS qualifying_sessions
FROM
high_event_sessions
GROUP BY
user_id
)
SELECT
user_id,
qualifying_sessions
FROM
user_counts
WHERE
qualifying_sessions > 1 The shape
Three pieces of work spread across two CTEs and the main query. The first CTE keeps only the high-intensity sessions, the second counts those qualifying sessions per user, and the main query applies the "more than one" threshold on top of the counted rows. Two restrictions of different shapes, with a count between them.
Clause by clause
The first CTE drops every low-intensity session:
WITH high_event_sessions AS (
SELECT user_id, event_count
FROM sessions
WHERE event_count >= 5
)WHERE event_count >= 5 runs against the raw session rows, before any grouping. After this layer, every row is by definition a high-intensity session.
The second CTE counts those rows per user:
user_counts AS (
SELECT user_id, COUNT(*) AS qualifying_sessions
FROM high_event_sessions
GROUP BY user_id
)GROUP BY user_id produces one row per user, and COUNT(*) counts only the qualifying sessions because that is the only thing in the source layer. A user with zero qualifying sessions never appears here.
The main query applies the threshold on the per-user count:
SELECT user_id, qualifying_sessions
FROM user_counts
WHERE qualifying_sessions > 1WHERE qualifying_sessions > 1 references the aggregate from the previous layer by its alias. Because the count was materialized into a named column in user_counts, the main query's WHERE can read it as if it were any other column.
Why the count threshold lives in the main query and not back in the first CTE
The two thresholds operate on different things. event_count >= 5 is a per-row condition on the raw sessions table, and it has to run before grouping because the column it tests exists only on the source rows. qualifying_sessions > 1 is a condition on an aggregated value, and the aggregate does not exist until GROUP BY runs. Splitting them across layers is what lets each one run at the level it makes sense at: the per-row filter on the per-row layer, the per-group filter on the per-group layer.
The trap
The two thresholds look similar in the prompt and easy to swap. Reading "more than one high-intensity session" as the row-level threshold would lead to WHERE event_count > 1 in the first CTE, which keeps almost every session and then counts them all. The > 1 is on the count of qualifying sessions per user, not on the events inside each session. The level at which each > applies is what the chained structure is making explicit.
You practiced layering two WITH stages with a final main-query restriction — pre-restrict in the first stage, count in the second, threshold-check in the main query.