Streamhub's data engineering team is auditing the completeness of the event pipeline:
- Every session must appear, including sessions with no events recorded (event ID will be
NULL). - Every event must appear, including events whose
session_iddoes not resolve to any session (session ID will beNULL).
Write a query to return the session ID and event ID for every row in the audit view.
Assumptions:
- A session with multiple events appears once per event in the result; a session with no events appears once with
event_idasNULL; an orphan event appears once withsession_idasNULL.
Output:
- One row per matched session-event pair, plus one row per silent session (with
event_idasNULL), plus one row per orphan event (withsession_idasNULL), with columnssession_idandevent_id.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
s.id AS session_id,
e.id AS event_id
FROM
sessions s
FULL OUTER JOIN events e ON s.id = e.session_id The shape
A FULL OUTER JOIN between sessions and events keeps every row from both sides. The matched rows multiply: a session that fired four events appears four times, one per event. The unmatched rows do not multiply: a silent session appears exactly once with event_id as NULL, and an orphan event appears exactly once with session_id as NULL. Cardinality is what controls the row count on the matched side; the outer join is what guarantees the orphan rows survive.
Clause by clause
SELECT s.id AS session_id, e.id AS event_idreturns the primary key from each side of the join. On a matched row both values are real. On a silent session,event_idisNULLbecause the events side was padded in by the outer join. On an orphan event,session_idisNULLfor the same reason on the opposite side. The twoNULLpatterns are how the data-engineering team labels each row as matched, silent, or orphan.FROM sessions s FULL OUTER JOIN events e ON s.id = e.session_idis the audit join. TheONcondition pairs a session with each of its events. Where it matches, the row is assembled from both sides — and the one-to-many relationship means a session with multiple events produces multiple rows. Where it doesn't, the outer join keeps the unmatched row anyway andNULL-pads the missing side.FULLmakes that guarantee in both directions.- No
WHERE. The team asked for the full audit view, so every row the join produces belongs in the output.
Why this and not a LEFT JOIN
LEFT JOIN anchored on sessions delivers the matched rows and the silent sessions, but loses every orphan event. Orphan events — events whose session_id doesn't resolve to any session — are usually the more interesting finding in a pipeline audit. Surfacing them alongside the silent sessions in a single pass is the whole reason the audit runs on both tables, which is exactly what FULL OUTER JOIN produces and what LEFT JOIN cannot.
The trap
The trap is reading the matched-row count as wrong because it's higher than the session count or the event count. A session with five events contributes five rows, not one. The matched side is producing one row per session-event pair, which is the correct shape for an event-level audit. Read the result by row category using the NULL patterns, not by counting against the source tables' row totals.
You practiced a FULL OUTER JOIN against a junction-style relationship. The recurring lesson: row count multiplies on the matched side (one row per session-event pair) while orphan rows on either side appear once each — three categories of rows, one query.