Scenario: Streamhub's analytics team monitors the rolling average of daily event volume across the first week of March 2024. Because the average must reflect quiet days as well as busy ones, every day in the range must contribute to the calculation.
Task: Write a query to return each date from March 1, 2024 through March 7, 2024 alongside the number of events recorded on that date and the running average daily event count from March 1, 2024 through that date inclusive.
Assumptions:
- The
eventstable holds one row per recorded event, with the timestamp stored inoccurred_at. - Some dates in the range have no recorded
events; those dates contribute a daily count of zero to the running average. - The running average on each date is the average of the daily counts from March 1, 2024 through that date inclusive.
Output:
- One row per date in the range, including dates with no
events. - Columns in this order:
day,daily_events,running_avg_events. - Sorted by
dayascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
spine AS (
SELECT
GENERATE_SERIES('2024-03-01'::date, '2024-03-07'::date, '1 day'::INTERVAL)::date AS DAY
)
SELECT
s.day,
COUNT(e.id) AS daily_events,
AVG(COUNT(e.id)) OVER (
ORDER BY
s.day
) AS running_avg_events
FROM
spine s
LEFT JOIN events e ON e.occurred_at::date = s.day
GROUP BY
s.day
ORDER BY
s.day The shape
A running average that reflects quiet days has to see those days as zero counts, not as absent rows. The spine produces the seven days, the LEFT JOIN lets the quiet days through as zero counts, and AVG as a window function over the per-day counts drags the running average downward across the empty stretch — which is exactly the behavior the prompt requires.
Clause by clause
WITH spine AS (SELECT generate_series('2024-03-01'::date, '2024-03-07'::date, '1 day'::interval)::date AS day)builds the seven-row backbone for the first week of March.COUNT(e.id) AS daily_eventsis the per-day aggregate.COUNT(e.id)ignores nulls, so days with no matching event report a daily count of zero — the value that has to feed into the running average.AVG(COUNT(e.id)) OVER (ORDER BY s.day) AS running_avg_eventswraps the daily count in a windowed average. The innerCOUNTcollapses each spine day to its daily count; the outerAVG ... OVER (ORDER BY s.day)averages those daily counts in date order. The default frame for an ordered aggregate window covers every row from the start through the current row, which is the running-from-day-one shape the prompt asks for.FROM spine s LEFT JOIN events e ON e.occurred_at::date = s.dayattaches each event to its day. TheLEFT JOINkeeps every spine row whether or not an event matches.GROUP BY s.daycollapses the joined rows back to one row per spine date.ORDER BY s.dayreturns the seven dates in calendar order.
Why this and not averaging over the raw events rows
Averaging over the raw fact table — say, AVG(some_per_event_value) OVER (ORDER BY occurred_at) — gives a different number entirely: the average across the events that exist. The prompt asks for the average daily count, which is the average across days. The two are equal only when every day has at least one event. The spine-plus-zero-fill form is the one that handles quiet days correctly.
The trap
The whole reason this problem is hard is that the right behavior looks wrong. The running average descends — from 2 on March 1 to roughly 0.29 on March 7 — and a learner who expected a smoothing effect on busy days may read that as a bug and try to "fix" it by switching to an INNER JOIN. The INNER JOIN would drop March 2 through 7 entirely and the running average would stay flat at 2, which is the wrong number for the analytics question. The descending line is the honest measurement: most days had no events, and the average reflects that.
You practiced computing a running average across zero-filled daily counts so quiet days drag the average down rather than dropping out of the window.