Daily Events with a Running Average

The problem

Scenario: Streamhub's analytics team monitors the rolling average of daily event volume across the first week of March 2024. Because the average must reflect quiet days as well as busy ones, every day in the range must contribute to the calculation.

Task: Write a query to return each date from March 1, 2024 through March 7, 2024 alongside the number of events recorded on that date and the running average daily event count from March 1, 2024 through that date inclusive.

Assumptions:

The events table holds one row per recorded event, with the timestamp stored in occurred_at.
Some dates in the range have no recorded events; those dates contribute a daily count of zero to the running average.
The running average on each date is the average of the daily counts from March 1, 2024 through that date inclusive.

Output:

One row per date in the range, including dates with no events.
Columns in this order: day, daily_events, running_avg_events.
Sorted by day ascending.

Schema · analytics 5 tables

users

id integer

name text

email text

country text

plan text

signed_up_at timestamptz

is_active boolean

conversions

id integer

user_id integer

converted_at timestamptz

plan text

amount numeric

sessions

id integer

user_id integer

started_at timestamptz

ended_at? timestamptz

event_count integer

events

id integer

user_id integer

session_id? integer

event_type text

occurred_at timestamptz

properties? jsonb

periods

id integer

name text

start_month integer

end_month integer

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  spine AS (
    SELECT
      GENERATE_SERIES('2024-03-01'::date, '2024-03-07'::date, '1 day'::INTERVAL)::date AS DAY
  )
SELECT
  s.day,
  COUNT(e.id) AS daily_events,
  AVG(COUNT(e.id)) OVER (
    ORDER BY
      s.day
  ) AS running_avg_events
FROM
  spine s
  LEFT JOIN events e ON e.occurred_at::date = s.day
GROUP BY
  s.day
ORDER BY
  s.day

The shape

A running average that reflects quiet days has to see those days as zero counts, not as absent rows. The spine produces the seven days, the LEFT JOIN lets the quiet days through as zero counts, and AVG as a window function over the per-day counts drags the running average downward across the empty stretch — which is exactly the behavior the prompt requires.

Clause by clause

WITH spine AS (SELECT generate_series('2024-03-01'::date, '2024-03-07'::date, '1 day'::interval)::date AS day) builds the seven-row backbone for the first week of March.
COUNT(e.id) AS daily_events is the per-day aggregate. COUNT(e.id) ignores nulls, so days with no matching event report a daily count of zero — the value that has to feed into the running average.
AVG(COUNT(e.id)) OVER (ORDER BY s.day) AS running_avg_events wraps the daily count in a windowed average. The inner COUNT collapses each spine day to its daily count; the outer AVG ... OVER (ORDER BY s.day) averages those daily counts in date order. The default frame for an ordered aggregate window covers every row from the start through the current row, which is the running-from-day-one shape the prompt asks for.
FROM spine s LEFT JOIN events e ON e.occurred_at::date = s.day attaches each event to its day. The LEFT JOIN keeps every spine row whether or not an event matches.
GROUP BY s.day collapses the joined rows back to one row per spine date.
ORDER BY s.day returns the seven dates in calendar order.

Why this and not averaging over the raw `events` rows

Averaging over the raw fact table — say, AVG(some_per_event_value) OVER (ORDER BY occurred_at) — gives a different number entirely: the average across the events that exist. The prompt asks for the average daily count, which is the average across days. The two are equal only when every day has at least one event. The spine-plus-zero-fill form is the one that handles quiet days correctly.

The trap

The whole reason this problem is hard is that the right behavior looks wrong. The running average descends — from 2 on March 1 to roughly 0.29 on March 7 — and a learner who expected a smoothing effect on busy days may read that as a bug and try to "fix" it by switching to an INNER JOIN. The INNER JOIN would drop March 2 through 7 entirely and the running average would stay flat at 2, which is the wrong number for the analytics question. The descending line is the honest measurement: most days had no events, and the average reflects that.

You practiced computing a running average across zero-filled daily counts so quiet days drag the average down rather than dropping out of the window.

Return each date from March 1, 2024 through March 7, 2024 alongside the number of `events` recorded on that date and the running average daily event count from March 1, 2024 through that date inclusive

The shape

Clause by clause

Why this and not averaging over the raw `events` rows

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not averaging over the raw events rows

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not averaging over the raw `events` rows