Streamhub's platform team wants to identify sessions that recorded no events.
Write a query to return the session ID for every session with no associated event.
Assumptions:
- The
sessionstable contains every session ever recorded on the platform. - The
eventstable records each event that occurred during a session;session_idlinks each event back to a session. - A session with no events has no row in
eventscarrying itsidassession_id.
Output:
- One row per silent session, with a single column
session_id.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
s.id AS session_id
FROM
sessions s
FULL OUTER JOIN events e ON s.id = e.session_id
WHERE
e.id IS NULL The shape
The anti-join from the sessions side: every session paired with its events, then WHERE e.id IS NULL keeps only the sessions that paired with nothing. Silent sessions land in the result; sessions that fired any event drop out. The pattern is exactly the same as the customers-without-orders question, just against a high-cardinality fact table.
Clause by clause
SELECT s.id AS session_idreturns just the session ID — the only column the platform team needs to chase down silent sessions.FROM sessions s FULL OUTER JOIN events e ON s.id = e.session_idis the reconciliation. Sessions with events appear once per event they fired; sessions with no events are kept with the events sideNULL-padded; any orphan event whosesession_iddoesn't resolve is also kept, with the sessions sideNULL-padded.WHERE e.id IS NULLfilters to the sessions whose events side came back empty after the join. Matched rows (sessions that fired events) drop out becausee.idis real. Orphan events drop out becausee.idis also real there. What's left is one row per session with no events, however many that turns out to be — over a hundred in this dataset.
Why filter on e.id and not on e.session_id
The e.id primary key is the safe column to test. It's NULL only on rows the outer join padded in, never on actual event rows. Filtering on e.session_id instead would let any event whose session_id was NULL in the source data slip through, polluting the silent-session list with what are actually orphan events. The rule the anti-join shape relies on: test a column that can only be NULL because the join produced no match. Primary-key columns are the reliable choice.
You practiced the anti-join in a high-cardinality fact-table context. The shape doesn't care that there are 100+ silent sessions; IS NULL on the events-side key returns every session that has no event row, however many that turns out to be.