Summary of Top Engaged Users — Chained CTEs in SQL

The problem

Streamhub's analytics team wants to profile its top-engagement segment: users whose session count exceeds the platform-wide per-user session average. The team needs the segment's size and its average total event count.

Write a query to return the count of top-engagement users and their average total event count as a single row.

Assumptions:

A user's session count is the number of sessions records linked to that user_id. A user's total events is the combined event_count across those records. Only users with at least one session contribute to the per-user session average.
The per-user session average is the average of those per-user session counts.
A user belongs to the top-engagement segment when their session count exceeds the per-user session average.
The output reports the size of that segment and the average total events across users in the segment.

Output:

A single row with columns top_user_count and avg_events_per_top_user.

Schema · analytics 5 tables

users

id integer

name text

email text

country text

plan text

signed_up_at timestamptz

is_active boolean

conversions

id integer

user_id integer

converted_at timestamptz

plan text

amount numeric

sessions

id integer

user_id integer

started_at timestamptz

ended_at? timestamptz

event_count integer

events

id integer

user_id integer

session_id? integer

event_type text

occurred_at timestamptz

properties? jsonb

periods

id integer

name text

start_month integer

end_month integer

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  user_stats AS (
    SELECT
      user_id,
      COUNT(*) AS session_count,
      SUM(event_count) AS total_events
    FROM
      sessions
    GROUP BY
      user_id
  ),
  top_users AS (
    SELECT
      user_id,
      total_events
    FROM
      user_stats
    WHERE
      session_count > (
        SELECT
          AVG(session_count)
        FROM
          user_stats
      )
  ),
  summary AS (
    SELECT
      COUNT(*) AS top_user_count,
      AVG(total_events) AS avg_events_per_top_user
    FROM
      top_users
  )
SELECT
  top_user_count,
  avg_events_per_top_user
FROM
  summary

The shape

Three cascading layers, each doing one thing. The first computes per-user statistics. The second reads that layer twice, once row-by-row and once through a scalar subquery that aggregates it to a single comparison value, and keeps the top-engagement users. The third reduces the survivors to a one-row summary. The shape mirrors the prompt's three questions: who are the users, which of them exceed the average, and what does the segment look like in aggregate.

Clause by clause

The first CTE collapses every user's sessions into a single statistics row:

WITH user_stats AS (
  SELECT user_id, COUNT(*) AS session_count, SUM(event_count) AS total_events
  FROM sessions
  GROUP BY user_id
)

COUNT(*) and SUM(event_count) are computed in the same pass. Both aggregates are needed downstream: session_count qualifies the user, and total_events is what the summary will average over.

The second CTE keeps users whose session count exceeds the per-user average:

top_users AS (
  SELECT user_id, total_events
  FROM user_stats
  WHERE session_count > (SELECT AVG(session_count) FROM user_stats)
)

The scalar subquery (SELECT AVG(session_count) FROM user_stats) reads user_stats again and reduces it to a single number. That number is then compared row-by-row against each user's session_count. The subquery is evaluated once and its result is reused on every row.

The third CTE reduces the surviving rows to a one-row summary:

summary AS (
  SELECT COUNT(*) AS top_user_count, AVG(total_events) AS avg_events_per_top_user
  FROM top_users
)

No GROUP BY is needed because there is no grouping key. Aggregating without GROUP BY collapses the whole source to one row, which is what the prompt asks for. COUNT(*) counts the segment's size; AVG(total_events) averages the carried-forward sum.

SELECT top_user_count, avg_events_per_top_user FROM summary returns the single row: 9 top-engagement users with an average of 77.11 total events.

The trap

The two averages in the query operate at different levels. AVG(session_count) in the second CTE is the average of per-user session counts, computed across the user table. AVG(total_events) in the third CTE is the average of per-user totals, computed across the qualifying segment. Pulling either one against the raw sessions table instead of the layer above it would produce a different number, weighted differently. The chain is what makes "average across users" mean exactly that, by giving each user one row in the layer the average is taken over.

You practiced cascading three WITH stages where the second references the first twice (row-by-row and through a scalar subquery) and the third reduces the second's row set to a one-row summary.

Return the count of top-engagement users and their average total event count as a single row

The shape

Clause by clause

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.