Streamhub's product team needs every unique event type each session produced — each event type listed once per session, no repeats.
Write a query to return every session ID and a comma-separated list of the unique event types that occurred in that session, in alphabetical order.
Assumptions:
- The
eventstable has one row per event with asession_idand anevent_type. - Each
session_idwith at least one event should appear once. - For each session, the types list contains each
event_typevalue at most once (regardless of how many events of that type the session generated), arranged alphabetically and separated by', '.
Output:
- One row per session with at least one event, with columns
session_idanddistinct_event_types.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
session_id,
STRING_AGG(
DISTINCT event_type,
', '
ORDER BY
event_type
) AS distinct_event_types
FROM
events
GROUP BY
session_id The shape
STRING_AGG(DISTINCT event_type, ', ' ORDER BY event_type) collapses every event in a session into a single text value, but the DISTINCT modifier removes repeated event types before the join happens, so each type appears exactly once. The product team gets one row per session showing only the unique event types that occurred, in alphabetical order.
Clause by clause
SELECT session_id, STRING_AGG(DISTINCT event_type, ', ' ORDER BY event_type) AS distinct_event_typesreturns the session and its deduplicated event-type list.DISTINCTinside the aggregate runs the deduplication pass over the input values before they are concatenated, which is why a session with three'feature_used'events and two'page_view'events comes out as'feature_used, page_view'and not'feature_used, feature_used, feature_used, page_view, page_view'. TheORDER BY event_typefixes the alphabetical sequence inside the value.FROM eventsreads the event rows. Every event is in scope; the deduplication happens inside the aggregate, not at the row level.GROUP BY session_idpartitions the rows by session so the aggregate runs once per session. One output row per distinctsession_id, matching the per-session output spec.
The trap
DISTINCT and ORDER BY inside the same aggregate have to agree on what counts as a unique value. STRING_AGG(DISTINCT event_type, ', ' ORDER BY event_type) is fine because both expressions reference the same column. Writing ORDER BY some_other_column next to DISTINCT event_type raises a syntax error in PostgreSQL: the planner cannot deduplicate on one expression and order on another, because the order of duplicates inside each unique value would be undefined. When you reach for DISTINCT inside the aggregate, the ORDER BY expression has to match the DISTINCT expression exactly.
You practiced STRING_AGG(DISTINCT column, separator ORDER BY column) — deduplicate values inside the aggregate before joining; the ORDER BY expression must match the DISTINCT expression.