Streamhub's data quality team is checking referential integrity. Every conversion should belong to a user who has at least one session record on the platform.
Write a query to return any user IDs that have a conversion record but no corresponding session.
Assumptions:
- Streamhub's referential integrity is currently strong — every user with conversions also has at least one session record. The result set will be empty.
- The query order matters:
conversions EXCEPT sessionsanswers this question;sessions EXCEPT conversionsanswers a different one (sessions with no conversions).
Output:
- One row per orphan conversion-user, with a single column
user_id. The result set will be empty.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
user_id
FROM
conversions
EXCEPT
SELECT
user_id
FROM
sessions The shape
EXCEPT returns rows from the left query that don't appear in the right query. With conversions on the left and sessions on the right, the result is the set of user IDs with a conversion but no session — the orphan-conversion list. Every converting user also has at least one session, so the difference is empty and the integrity check passes cleanly.
Clause by clause
SELECT user_id FROM conversionsis the left input. Every conversion record projects its user. The data quality question — "are any of these users missing fromsessions?" — is asking whether this set is fully contained in the session set.EXCEPTis the set operator. PostgreSQL evaluates both queries, deduplicates each side, then returns rows from the left with no match on the right. When every converting user has at least one session, the difference is empty.SELECT user_id FROM sessionsis the right input — the reference set the conversions should be a subset of.- The empty result is the affirmative answer: zero orphan conversion-users, the integrity check is clean.
Why this and not the swapped order
Directionality is everything. conversions EXCEPT sessions answers "conversions with no session" — the orphan-conversion check the prompt asks for. sessions EXCEPT conversions answers "sessions with no conversion" — a much larger result catching every user who visited but never paid. Both queries run cleanly; the only difference is which question the human is answering.
The invariant being checked is "every conversion must trace back to a real session." The left query is the side being checked (conversions); the right query is the reference set it should be a subset of (sessions).
The trap
The trap is treating an empty result as a query bug. When an integrity check returns zero rows, the first reaction is to assume the query is misshapen — that EXCEPT is silently filtering something, or that column names don't line up. Neither is the case. The empty result is the affirmative finding: nothing to fix.
The way to verify the query is real is to flip the direction in a separate run. sessions EXCEPT conversions against the same data returns a large non-empty result, which confirms the operator is doing its job and the integrity direction simply happens to be clean.
You practiced EXCEPT as a referential-integrity check. The recurring shape any time the question is "are there entities in A with no counterpart in B" — and the empty-result outcome is a clean integrity pass, not a failure to write the query.