Unique Session or Event User IDs — UNION in SQL

The problem

Streamhub's data team wants a deduplicated list of every user ID that has either started a session or triggered an event on the platform.

Write a query to return each qualifying user ID exactly once.

Assumptions:

The sessions table records each session; user_id identifies the user.
The events table records each platform event; user_id identifies the user.
A user with both sessions and events should appear once in the result, not twice.

Output:

One row per distinct user ID, with a single column user_id.

Schema · analytics 5 tables

users

id integer

name text

email text

country text

plan text

signed_up_at timestamptz

is_active boolean

conversions

id integer

user_id integer

converted_at timestamptz

plan text

amount numeric

sessions

id integer

user_id integer

started_at timestamptz

ended_at? timestamptz

event_count integer

events

id integer

user_id integer

session_id? integer

event_type text

occurred_at timestamptz

properties? jsonb

periods

id integer

name text

start_month integer

end_month integer

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

SELECT
  user_id
FROM
  sessions
UNION
SELECT
  user_id
FROM
  events

The shape

UNION combines the two queries and then deduplicates the combined result. Every qualifying user_id appears exactly once — whether the user came from sessions, from events, or from both.

Clause by clause

SELECT user_id FROM sessions is the left input: one row per session record, projecting the user behind each one. A user with multiple sessions contributes multiple rows here.
UNION is the set operator. After PostgreSQL evaluates both queries, it sorts or hashes the entire combined result and keeps each distinct row once. The deduplication is across the full combined set: a user appearing five times in sessions and three times in events collapses to a single row.
SELECT user_id FROM events is the right input. Same column, same type as the left query, which is what lets the two sides stack into a single column.

Why this and not `UNION ALL`

UNION ALL would skip the deduplication step and return every record from both tables — the same user appearing in both contributes two rows, not one. The prompt asks for each qualifying user once, which is exactly the work UNION does on top of the concatenation. The deduplication has a real cost on large inputs, but here it's the load-bearing behavior, not overhead.

You practiced UNION (with deduplication) over two result sets. The recurring rule: UNION removes duplicates across the combined result — including duplicates within each side and duplicates that span both sides — by sorting or hashing the combined output.

Return each qualifying user ID exactly once

The shape

Clause by clause

Why this and not `UNION ALL`

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not UNION ALL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not `UNION ALL`