Users Above Average Session Count — CTEs in SQL

The problem

Streamhub's analytics team wants to flag power users — those whose session count exceeds the average session count across every user with at least one session.

Write a query to return the user ID and session count for every user whose session count is above the per-user average.

Assumptions:

A user's session count is the number of sessions records linked to that user_id. Only users with at least one session contribute to the per-user average.
The per-user average is the average of those per-user session counts.
Only users whose session count exceeds the per-user average should appear.

Output:

One row per qualifying user, with columns user_id and session_count.

Schema · analytics 5 tables

users

id integer

name text

email text

country text

plan text

signed_up_at timestamptz

is_active boolean

conversions

id integer

user_id integer

converted_at timestamptz

plan text

amount numeric

sessions

id integer

user_id integer

started_at timestamptz

ended_at? timestamptz

event_count integer

events

id integer

user_id integer

session_id? integer

event_type text

occurred_at timestamptz

properties? jsonb

periods

id integer

name text

start_month integer

end_month integer

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  user_sessions AS (
    SELECT
      user_id,
      COUNT(*) AS session_count
    FROM
      sessions
    GROUP BY
      user_id
  )
SELECT
  user_id,
  session_count
FROM
  user_sessions
WHERE
  session_count > (
    SELECT
      AVG(session_count)
    FROM
      user_sessions
  )

The shape

The WITH layer is referenced twice in one statement: once as the source the main query reads from row by row, and once inside a scalar subquery that re-reads the same layer to compute the per-user average. The layer's name resolves consistently in both places, so a single named computation serves both roles.

Clause by clause

The WITH clause defines user_sessions:

WITH user_sessions AS (
  SELECT user_id, COUNT(*) AS session_count
  FROM sessions
  GROUP BY user_id
)

GROUP BY user_id partitions sessions per user; COUNT(*) produces each user's session count. The layer ends up with one row per user.

SELECT user_id, session_count FROM user_sessions WHERE session_count > (SELECT AVG(session_count) FROM user_sessions) is the main query. The outer reference reads each user's row from the layer. The scalar subquery in the WHERE reads the same layer a second time and collapses every per-user count to a single average. That average is then compared against each user's session_count row by row. Users with counts above the average stay in: user 1 and user 3 at 9 sessions, and users 7, 8, 14, 18, 20, 23, and 27 at 4.

Why a CTE referenced twice and not a derived table

A derived table in FROM produces a named result for the main query, but the name only exists in the FROM clause it was defined in. The scalar subquery in the WHERE clause cannot reach into another FROM for that name. To use the per-user counts in both places, a plain derived table would have to be repeated: once in the outer FROM and once inside the scalar subquery, with the same GROUP BY written out twice. The CTE shares the named layer across both reference sites in a single statement, which is exactly what naming the intermediate result is for.

The trap

The scalar subquery computes the average of the per-user session counts, not the average across raw session rows. AVG(session_count) FROM user_sessions averages one value per user; (SELECT AVG(1.0) FROM sessions) would be meaningless, and an average over sessions directly without the grouping would not produce the per-user benchmark the prompt asks for. The naming step inside the layer is what makes the second-pass average a per-user statistic rather than a per-session one.

You practiced referencing the same WITH layer twice in one statement — once for each user's value, once inside a scalar subquery that computes a layer-wide statistic the main query compares against.

Return the user ID and session count for every user whose session count is above the per-user average

The shape

Clause by clause

Why a CTE referenced twice and not a derived table

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.