Streamhub's growth team wants to gather every user ID that appears in either session records or conversion records into a single list for batch processing. Duplicates must be preserved — a user with three sessions and two conversions should produce five rows.
Write a query to return the combined list of user IDs.
Assumptions:
- The
sessionstable records each session;user_ididentifies the user. - The
conversionstable records each paid conversion;user_ididentifies the user. - The output should not deduplicate — every record from both tables contributes one row.
Output:
- One row per source record across both tables, with a single column
user_id.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
user_id
FROM
sessions
UNION ALL
SELECT
user_id
FROM
conversions The shape
UNION ALL concatenates the two result sets without inspecting them. Every user_id from sessions lands first, followed by every user_id from conversions — a user with three sessions and two conversions contributes five rows, which is the row-per-record shape the batch processor wants.
Clause by clause
SELECT user_id FROM sessionsis the left query: one row per session, projecting just the user behind it.UNION ALLis the set operator. TheALLkeyword turns off deduplication, so PostgreSQL skips the sort-or-hash pass thatUNIONwould add and just stacks the two result sets back-to-back. A user appearing multiple times in either input appears multiple times in the output.SELECT user_id FROM conversionsis the right query. The two sides must agree on column count and column type — one column of integers on each side here, which lines up.
Why this and not UNION
UNION would also combine the two result sets, but it adds a deduplication pass on top: PostgreSQL sorts or hashes the entire combined output and keeps each distinct row once. The batch processor wants every record represented; a user with three sessions and two conversions should produce five rows, not one. UNION would silently collapse them to one.
You practiced UNION ALL to stack two result sets without deduplication. The recurring rule: UNION ALL is the cheap concatenation; UNION adds a deduplication pass on top of it — pick UNION ALL whenever duplicates are wanted (or are known to be impossible).