Scenario: Streamhub's analytics team needs a complete day-by-day timeline of 'page_view' activity for January 2022, including days with no recorded events.
Task: Write a query to return each date from January 1, 2022 through January 31, 2022 alongside the count of 'page_view' events recorded on that date.
Assumptions:
- The
eventstable holds one row per recorded event, withevent_typerecording the kind of activity andoccurred_atrecording when it happened. - A
'page_view'event hasevent_typeequal to'page_view'. - Some dates in the range have no recorded
'page_view'events; those dates must still appear in the result, withpage_view_countreported as a missing value rather than0.
Output:
- One row per date in the range, including dates with no
'page_view'events. - Columns in this order:
day,page_view_count. - Sorted by
dayascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
date_spine AS (
SELECT
GENERATE_SERIES('2022-01-01'::date, '2022-01-31'::date, INTERVAL '1 day')::date AS DAY
),
daily_page_views AS (
SELECT
occurred_at::date AS event_day,
COUNT(*) AS event_count
FROM
events
WHERE
event_type = 'page_view'
GROUP BY
occurred_at::date
)
SELECT
ds.day,
dpv.event_count AS page_view_count
FROM
date_spine ds
LEFT JOIN daily_page_views dpv ON dpv.event_day = ds.day
ORDER BY
ds.day The shape
generate_series produces a row for every date in the window, and a LEFT JOIN from that spine to a daily count of 'page_view' events keeps the quiet days in the result with NULL on the count side. The two-CTE structure separates "what dates do we want" from "what activity actually happened."
Clause by clause
- The
date_spineCTE runsgenerate_series('2022-01-01'::date, '2022-01-31'::date, INTERVAL '1 day')::date AS day. The series generates one timestamp per day across the range; the trailing::datecast pins each result to a calendar date. daily_page_viewsgroups the events to one row per day withoccurred_at::date AS event_day, COUNT(*) AS event_count FROM events WHERE event_type = 'page_view' GROUP BY occurred_at::date. Filtering before aggregating meansCOUNTonly sees page-view rows.- The outer
FROM date_spine ds LEFT JOIN daily_page_views dpv ON dpv.event_day = ds.daykeeps every spine day in the result. When the right side has no match,dpv.event_countcomes back asNULL, which is the missing-value the prompt asks for. SELECT ds.day, dpv.event_count AS page_view_countandORDER BY ds.dayproduce the requested two-column shape and sort.
Why this and not an INNER JOIN with COUNT from events
If you started from events and grouped by day, January 1 through January 14 would disappear from the result because no rows exist for those dates. There would be nothing to group. The spine flips the direction of the read: every date is in the driving table, and the events table contributes only where it matches. That is what guarantees all 31 rows.
The trap
A COALESCE(dpv.event_count, 0) is the obvious-looking fix for the empty days, and on most "no activity" reports it would be the right call. The prompt explicitly asks for a missing value rather than 0, which is a distinction the grader checks: 0 and NULL are not the same value, and substituting one for the other changes 30 of the 31 result rows here.
You practiced left-joining a date spine to per-day event counts — preserving the missing-value count on quiet days, in contrast to substituting 0.