Scenario: Streamhub's analytics team wants to understand engagement levels across plan tiers — how active users on each plan compare in event volume.
Task: Write a query to return each plan, the number of active users on that plan, the total events those users have generated across all time, and the average events per active user.
Assumptions:
- An active user has
is_activeequal toTRUE. - An active user with no recorded
eventscontributes0to their event count rather than dropping out of the per-plan averages.
Output:
- One row per plan with at least one active user.
- Columns in this order:
plan,active_users,total_events,avg_events. - Sorted by
avg_eventsdescending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
active_users AS (
SELECT
u.id AS user_id,
u.plan
FROM
users u
WHERE
u.is_active = TRUE
),
user_activity AS (
SELECT
au.user_id,
au.plan,
COUNT(e.id) AS event_count
FROM
active_users au
LEFT JOIN events e ON e.user_id = au.user_id
GROUP BY
au.user_id,
au.plan
),
plan_summary AS (
SELECT
plan,
COUNT(user_id) AS active_users,
SUM(event_count) AS total_events,
AVG(event_count) AS avg_events
FROM
user_activity
GROUP BY
plan
)
SELECT
plan,
active_users,
total_events,
avg_events
FROM
plan_summary
ORDER BY
avg_events DESC The shape
Three CTEs that protect the per-plan average from a silent miscount. The first names the active-user set, the second left-joins those users to events so a user with zero events still produces a row, and the third aggregates per plan. The LEFT JOIN in the middle layer is the load-bearing piece: without it, zero-event users would vanish before the average gets computed.
Clause by clause
WITH active_users AS (
SELECT u.id AS user_id, u.plan
FROM users u
WHERE u.is_active = TRUE
)The active-user set is named once and reused. Only user_id and plan are needed downstream, so other columns are dropped.
user_activity AS (
SELECT au.user_id, au.plan, COUNT(e.id) AS event_count
FROM active_users au
LEFT JOIN events e ON e.user_id = au.user_id
GROUP BY au.user_id, au.plan
)The LEFT JOIN keeps every active user even when no event row matches. COUNT(e.id) counts non-null e.id values, which means a user with no events gets a count of 0 rather than vanishing. Grouping by user_id and plan produces one row per active user with their personal event count.
plan_summary AS (
SELECT plan, COUNT(user_id) AS active_users, SUM(event_count) AS total_events, AVG(event_count) AS avg_events
FROM user_activity
GROUP BY plan
)The per-plan rollup runs across the full per-user set, zeros included. COUNT(user_id) is the active-user count per plan; SUM and AVG use the per-user event_count. Enterprise averages 8.8 events per user, free averages 0.54.
SELECT plan, active_users, total_events, avg_events FROM plan_summary ORDER BY avg_events DESCreturns the four plans ordered by per-user engagement.
The trap
Switching the LEFT JOIN to an INNER JOIN in the middle layer would silently change the average. Users with no events stop producing rows, so the per-plan AVG divides by a smaller denominator and the number inflates. The active_users count would also drop, but the divergence between the two would not raise any error. The prompt explicitly preserves zero-event users in the average, and the LEFT JOIN plus COUNT(e.id) is what enforces that.
You practiced staging active users, per-user event counts, and per-plan summaries as three CTEs, so users with zero events still appear in the average rather than dropping out.