Streamhub's growth team is identifying highly engaged users.
Write a query to return the ID and session count of every user who has recorded more than 5 sessions.
Assumptions:
- The
sessionstable has one row per session with auser_id. - A user's session count is the number of
sessionsrecords linked to thatuser_id. - Only users whose session count exceeds
5should appear.
Output:
- One row per qualifying user, with columns
user_idandsession_count.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
user_sessions AS (
SELECT
user_id,
COUNT(*) AS session_count
FROM
sessions
GROUP BY
user_id
)
SELECT
user_id,
session_count
FROM
user_sessions
WHERE
session_count > 5 The shape
The WITH layer computes a per-user session count. The main query then filters that named result with WHERE session_count > 5, comparing against the aggregate as a column.
Clause by clause
- The
WITHclause definesuser_sessions:
WITH user_sessions AS (
SELECT user_id, COUNT(*) AS session_count
FROM sessions
GROUP BY user_id
)GROUP BY user_id partitions sessions per user; COUNT(*) counts each partition. The named layer ends up with one row per user, every user included regardless of count.
SELECT user_id, session_count FROM user_sessions WHERE session_count > 5is the main query. It reads the named layer and keeps only the rows where the count exceeds5. Users 1 and 3 survive with9sessions each; every other user falls below the threshold and drops out.
Why this and not a derived table in FROM
A derived table would compute the same aggregation in the main query's FROM clause and apply the threshold in the same WHERE. Both forms return the same two rows. The WITH version moves the aggregation above the main query and gives it a name. The main query then reads as two clear steps: compute the per-user count, keep the ones above 5. The derived-table version compresses both into one nested statement.
The trap
The threshold runs on the aggregate, which only exists once the grouping is finished. The named column session_count is the output of COUNT(*) per user, not a column on sessions. The filter belongs in the main query, against the layer's output, because that is the only place the aggregate exists as a column to compare against.
You practiced computing a per-user count in a WITH layer and applying a threshold check in the main query.