Scenario: Streamhub's analytics team needs a per-session summary covering every session — including sessions with no recorded events.
Task: Write a query to return each session's id, the user_id it belongs to, the count of events in that session, the earliest event timestamp, and the latest event timestamp.
Assumptions:
- A session's
event_countis the number ofeventsrecorded against it;first_event_atandlast_event_atare the earliest and latest event timestamps within the session. - The result covers every session.
- A session with no recorded
eventsappears withevent_countof0and bothfirst_event_atandlast_event_atreported as missing values.
Output:
- One row per session.
- Columns in this order:
session_id,user_id,event_count,first_event_at,last_event_at.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
s.id AS session_id,
s.user_id,
event_stats.event_count,
event_stats.first_event_at,
event_stats.last_event_at
FROM
sessions s
LEFT JOIN LATERAL (
SELECT
COUNT(*) AS event_count,
MIN(occurred_at) AS first_event_at,
MAX(occurred_at) AS last_event_at
FROM
events e
WHERE
e.session_id = s.id
) AS event_stats ON TRUE The shape
LEFT JOIN LATERAL runs a three-aggregate query once per session, returning the count, earliest timestamp, and latest timestamp in a single pass. Sessions with no events still appear because the join's LEFT half preserves every outer row.
Clause by clause
FROM sessions sdrives the query; every session is preserved.LEFT JOIN LATERAL (SELECT COUNT(*) AS event_count, MIN(occurred_at) AS first_event_at, MAX(occurred_at) AS last_event_at FROM events e WHERE e.session_id = s.id) AS event_stats ON trueruns once per session. The inner query filterseventsto that session and computes all three aggregates simultaneously.ON trueis the standard LATERAL pairing — every outer row keeps its lateral subquery's row.SELECT s.id AS session_id, s.user_id, event_stats.event_count, event_stats.first_event_at, event_stats.last_event_atreads the session identifiers and all three aggregates from the lateral result.
Why LATERAL and not pre-aggregate + LEFT JOIN
A pre-aggregated derived table is the natural alternative:
SELECT s.id AS session_id, s.user_id,
COALESCE(es.event_count, 0) AS event_count,
es.first_event_at, es.last_event_at
FROM sessions s
LEFT JOIN (
SELECT session_id,
COUNT(*) AS event_count,
MIN(occurred_at) AS first_event_at,
MAX(occurred_at) AS last_event_at
FROM events GROUP BY session_id
) es ON es.session_id = s.idBoth shapes are correct. The pre-aggregated form aggregates events once across the whole table and then joins; LATERAL aggregates once per session. On a session table much smaller than the event table, pre-aggregation usually scales better; on a session table where most rows are then filtered down to a few, LATERAL can be faster because it only aggregates for sessions you keep. The two are equivalent in correctness; the choice is a cardinality and selectivity question.
The trap
The empty-set behavior of the three aggregates is not uniform. For a session with no matching events, the lateral subquery executes against an empty input and returns one row containing COUNT(*) = 0, MIN(occurred_at) = NULL, MAX(occurred_at) = NULL. That row matches the prompt's contract exactly: event_count of 0, both timestamps NULL. The pre-aggregated alternative behaves differently — sessions with no events match no row in the derived table, so all three columns come back NULL, including event_count, and you'd need COALESCE(event_count, 0) to recover the contract. LATERAL's "always return one row, even if empty" semantics is what makes the zero-count case fall out for free here.
You practiced returning three per-session metrics from a single LATERAL subquery — a shape that delivers multi-column output that a scalar correlated subquery cannot.