Scenario: Streamhub's product team is analyzing what brings users into a session by examining which event opens each session.
Task: Write a query to return every event that is the first event recorded in its session — those with no preceding event in the same session — returning the event's id, session_id, event_type, and occurred_at.
Assumptions:
- Within a session, events are ordered by
occurred_atascending. - An event qualifies as the first event in its session when no other event in the same session has an earlier
occurred_at. - The result covers only first-of-session events.
Output:
- One row per qualifying event.
- Columns in this order:
event_id,session_id,event_type,occurred_at. - Sorted by
session_idascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
event_gaps AS (
SELECT
id,
session_id,
event_type,
occurred_at,
LAG(occurred_at) OVER (
PARTITION BY
session_id
ORDER BY
occurred_at
) AS prev_occurred_at
FROM
events
)
SELECT
id AS event_id,
session_id,
event_type,
occurred_at
FROM
event_gaps
WHERE
prev_occurred_at IS NULL
ORDER BY
session_id The shape
The first event in each session is the one whose LAG(occurred_at) is NULL — there is no prior event inside the partition to look back to. Computing the gap in a CTE and then filtering for prev_occurred_at IS NULL keeps the logic in two clean stages.
Clause by clause
- The
event_gapsCTE attachesLAG(occurred_at) OVER (PARTITION BY session_id ORDER BY occurred_at) AS prev_occurred_atto every event row. Inside each session,LAGreturns the prior event's timestamp, except for the earliest event, where it returnsNULLbecause no prior row exists in the partition. - The outer
SELECT id AS event_id, session_id, event_type, occurred_at FROM event_gaps WHERE prev_occurred_at IS NULLkeeps only the rows where the look-back was empty, which are exactly the session-opening events. ORDER BY session_idsorts the result the way the prompt asks.
Why this and not a per-session MIN(occurred_at) join
You could compute MIN(occurred_at) per session_id in a CTE, then join events back on (session_id, occurred_at) to recover the opening event's full row. That works, but it does two passes over the data: one to find the minimums, one to look the rows back up. The LAG approach makes one pass and uses the NULL that LAG already produces at each partition boundary as the filter, which is exactly the marker we need.
The trap
Filtering on occurred_at = MIN(occurred_at) without a window or a join scope error is the easy mistake. A bare aggregate in WHERE does not compile, and pushing it to HAVING collapses each session to one row but loses the other event columns. The LAG IS NULL pattern sidesteps the grouping question entirely: every event row stays intact, and the filter selects on a per-row computed column rather than a group aggregate.
You practiced flagging session-opening events by checking whether LAG returns a missing value within the partition — the missing-value lookback identifies the row that has no predecessor in its session.