Scenario: Streamhub's growth team defines a 'return visit' as a session that begins more than seven days after the user's previous session.
Task: Write a query to return every return-visit session's user_id, session_id, started_at, and gap_from_prev — the time elapsed since that user's previous session began.
Assumptions:
- Within a user's history, sessions are ordered by
started_atascending. - A session's
gap_from_previs the difference between itsstarted_atand thestarted_atof the user's immediately preceding session. - A return-visit session is one whose
gap_from_previs strictly greater than seven days. - The result covers only return-visit sessions.
Output:
- One row per return-visit session.
- Columns in this order:
user_id,session_id,started_at,gap_from_prev. - Sorted by
user_idascending, thenstarted_atascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
session_gaps AS (
SELECT
id AS session_id,
user_id,
started_at,
started_at - LAG(started_at) OVER (
PARTITION BY
user_id
ORDER BY
started_at
) AS gap_from_prev
FROM
sessions
)
SELECT
user_id,
session_id,
started_at,
gap_from_prev
FROM
session_gaps
WHERE
gap_from_prev > INTERVAL '7 days'
ORDER BY
user_id,
started_at The shape
A "return visit" is defined entirely by the gap between consecutive sessions for the same user, so computing that gap with LAG in a CTE and then filtering for gaps over seven days yields the return-visit set in one straight pass.
Clause by clause
- The
session_gapsCTE returns every session along withstarted_at - LAG(started_at) OVER (PARTITION BY user_id ORDER BY started_at) AS gap_from_prev. The partition keeps the look-back inside one user's history; the subtraction produces an interval per session. The user's first session hasNULLhere because there is no prior row inside the partition. SELECT user_id, session_id, started_at, gap_from_prev FROM session_gaps WHERE gap_from_prev > INTERVAL '7 days'keeps only the sessions whose gap exceeds the seven-day threshold. BecauseNULL > INTERVAL '7 days'evaluates toNULL(not true), every user's first session is automatically excluded without any extra clause.ORDER BY user_id, started_atproduces the requested sort.
Why this and not a self-join on consecutive sessions
You could join sessions to itself on user_id, pair each session with one started earlier, take the maximum earlier-session started_at per outer session, and subtract. That reproduces the same gap, but it asks the planner to materialise every prior-session pair before reducing them. LAG does the same look-back in one window pass and gives you the gap directly on the row, which keeps the query a single linear read of the table.
The trap
The threshold is on a duration, not a date. Writing WHERE gap_from_prev > 7 compares an interval to an integer and either errors out or coerces in ways that vary by version. INTERVAL '7 days' is the right-hand side that lets the comparison stay in interval arithmetic the whole way through. The same care applies if the rule changes to "more than three hours" — the constant has to carry its unit.
You practiced computing per-user session gaps in a CTE, then keeping only sessions whose gap exceeds the seven-day re-engagement threshold.