Streamhub's growth team wants to identify highly engaged users — those who have recorded more than 2 sessions and whose average events per session exceeds 10.
Write a query to return the user ID and average events per session for every highly engaged user.
Assumptions:
- The
sessionstable has one row per session with auser_idand anevent_count. - A user's session count is the number of
sessionsrecords linked to thatuser_id. A user's average events per session is the average ofevent_countacross those records. - A user qualifies only when both conditions hold: their session count is greater than
2, and their average events per session is greater than10.
Output:
- One row per qualifying user, with columns
user_idandavg_events.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
session_stats AS (
SELECT
user_id,
COUNT(*) AS session_count,
AVG(event_count) AS avg_events
FROM
sessions
GROUP BY
user_id
),
frequent_users AS (
SELECT
user_id,
session_count,
avg_events
FROM
session_stats
WHERE
session_count > 2
),
high_engagement AS (
SELECT
user_id,
avg_events
FROM
frequent_users
WHERE
avg_events > 10
)
SELECT
user_id,
avg_events
FROM
high_engagement The shape
Three named layers in one WITH clause: aggregate, restrict, restrict. The base CTE computes both per-user statistics in one pass, the second drops users below the session-count threshold, and the third drops users below the average-events threshold. Each restriction applies against the prior layer's output, so the two conditions compose without colliding.
Clause by clause
The first CTE collapses every user's sessions into a single statistics row:
WITH session_stats AS (
SELECT user_id, COUNT(*) AS session_count, AVG(event_count) AS avg_events
FROM sessions
GROUP BY user_id
)GROUP BY user_id produces one row per user. COUNT(*) and AVG(event_count) are both computed in the same pass, and both aliases are carried forward so the next two layers can reference them.
The second CTE applies the session-count threshold:
frequent_users AS (
SELECT user_id, session_count, avg_events
FROM session_stats
WHERE session_count > 2
)WHERE session_count > 2 drops the one- and two-session users. The SELECT list carries avg_events forward unchanged, because the next layer needs it to apply the second threshold.
The third CTE applies the average-events threshold:
high_engagement AS (
SELECT user_id, avg_events
FROM frequent_users
WHERE avg_events > 10
)Now the average check runs only against users who already cleared the frequency check, which is what "both conditions hold" means.
SELECT user_id, avg_events FROM high_engagementreturns the final qualifying set.
Why three CTEs instead of one with a combined WHERE
The two thresholds could be written as WHERE session_count > 2 AND avg_events > 10 in a single follow-up layer, and the result would be identical. Splitting them into named stages is a readability decision: each filter has a name (frequent_users, high_engagement) that says what survived it, and either stage can be queried in isolation when debugging. For a two-condition filter the gain is modest; for the chained query as a pedagogical example, the structure shows what cascading layers look like before the conditions get any harder.
You practiced cascading three WITH stages — a base statistics layer, a first restriction stage, and a second restriction stage — so each condition applies against the prior layer's output rather than the raw source.