Highly Engaged Frequent Users — Chained CTEs in SQL

The problem

Streamhub's growth team wants to identify highly engaged users — those who have recorded more than 2 sessions and whose average events per session exceeds 10.

Write a query to return the user ID and average events per session for every highly engaged user.

Assumptions:

The sessions table has one row per session with a user_id and an event_count.
A user's session count is the number of sessions records linked to that user_id. A user's average events per session is the average of event_count across those records.
A user qualifies only when both conditions hold: their session count is greater than 2, and their average events per session is greater than 10.

Output:

One row per qualifying user, with columns user_id and avg_events.

Schema · analytics 5 tables

users

id integer

name text

email text

country text

plan text

signed_up_at timestamptz

is_active boolean

conversions

id integer

user_id integer

converted_at timestamptz

plan text

amount numeric

sessions

id integer

user_id integer

started_at timestamptz

ended_at? timestamptz

event_count integer

events

id integer

user_id integer

session_id? integer

event_type text

occurred_at timestamptz

properties? jsonb

periods

id integer

name text

start_month integer

end_month integer

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  session_stats AS (
    SELECT
      user_id,
      COUNT(*) AS session_count,
      AVG(event_count) AS avg_events
    FROM
      sessions
    GROUP BY
      user_id
  ),
  frequent_users AS (
    SELECT
      user_id,
      session_count,
      avg_events
    FROM
      session_stats
    WHERE
      session_count > 2
  ),
  high_engagement AS (
    SELECT
      user_id,
      avg_events
    FROM
      frequent_users
    WHERE
      avg_events > 10
  )
SELECT
  user_id,
  avg_events
FROM
  high_engagement

The shape

Three named layers in one WITH clause: aggregate, restrict, restrict. The base CTE computes both per-user statistics in one pass, the second drops users below the session-count threshold, and the third drops users below the average-events threshold. Each restriction applies against the prior layer's output, so the two conditions compose without colliding.

Clause by clause

The first CTE collapses every user's sessions into a single statistics row:

WITH session_stats AS (
  SELECT user_id, COUNT(*) AS session_count, AVG(event_count) AS avg_events
  FROM sessions
  GROUP BY user_id
)

GROUP BY user_id produces one row per user. COUNT(*) and AVG(event_count) are both computed in the same pass, and both aliases are carried forward so the next two layers can reference them.

The second CTE applies the session-count threshold:

frequent_users AS (
  SELECT user_id, session_count, avg_events
  FROM session_stats
  WHERE session_count > 2
)

WHERE session_count > 2 drops the one- and two-session users. The SELECT list carries avg_events forward unchanged, because the next layer needs it to apply the second threshold.

The third CTE applies the average-events threshold:

high_engagement AS (
  SELECT user_id, avg_events
  FROM frequent_users
  WHERE avg_events > 10
)

Now the average check runs only against users who already cleared the frequency check, which is what "both conditions hold" means.

SELECT user_id, avg_events FROM high_engagement returns the final qualifying set.

Why three CTEs instead of one with a combined `WHERE`

The two thresholds could be written as WHERE session_count > 2 AND avg_events > 10 in a single follow-up layer, and the result would be identical. Splitting them into named stages is a readability decision: each filter has a name (frequent_users, high_engagement) that says what survived it, and either stage can be queried in isolation when debugging. For a two-condition filter the gain is modest; for the chained query as a pedagogical example, the structure shows what cascading layers look like before the conditions get any harder.

You practiced cascading three WITH stages — a base statistics layer, a first restriction stage, and a second restriction stage — so each condition applies against the prior layer's output rather than the raw source.

Return the user ID and average events per session for every highly engaged user

The shape

Clause by clause

Why three CTEs instead of one with a combined `WHERE`

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why three CTEs instead of one with a combined WHERE

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why three CTEs instead of one with a combined `WHERE`