Streamhub's engagement team needs to compare each session's event count to the average event count across that session's user.
Write a query to return the ID, user ID, and event count of every session, plus the average event count across that user's sessions on each row.
Assumptions:
- The
sessionstable has one row per session with anid, auser_id, and anevent_count. - A user's average event count is the average of
event_countacross every session linked to thatuser_id. The same value should appear on every row that shares auser_id.
Output:
- One row per session, with columns
id,user_id,event_count, anduser_avg_events.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
id,
user_id,
event_count,
AVG(event_count) OVER (
PARTITION BY
user_id
) AS user_avg_events
FROM
sessions The shape
AVG(event_count) OVER (PARTITION BY user_id) computes a separate average for each user_id and writes that user's average onto every session belonging to the user. Two sessions from the same user share an average; sessions from different users see different averages. Every session row stays in the output.
Clause by clause
SELECT id, user_id, event_countreturns each session's identifier, the user who owned it, and the session's individual event count, one row per session.- The window column is:
AVG(event_count) OVER (PARTITION BY user_id) AS user_avg_eventsPARTITION BY user_id splits the row set into one group per distinct user_id. AVG(event_count) runs inside each group independently, so the value attached to a given session is the average of event_count across every session that shares its user_id. All sessions belonging to the same user see the same user_avg_events.
FROM sessionsreads every session. The engagement team is comparing each session to its user's average, so every row stays in.
Why this and not GROUP BY user_id
GROUP BY user_id would collapse the table to one row per user holding the per-user average, and the individual session rows would be gone. The comparison the engagement team is making, this session's event count versus this user's typical event count, requires both numbers on the same row. PARTITION BY is the construct that produces the per-group aggregate without throwing away the rows it summarises.
You practiced partitioning an AVG window by a foreign-key column — every record sees its own user's per-user average alongside the individual value.