Scenario: Streamhub's product team tracks monthly event volume separately for each kind of activity, comparing each event_type to its own prior month rather than to a different event_type.
Task: Write a query to return each event_type, calendar month, number of events of that type in that month, and number of events of the same type in the immediately preceding month.
Assumptions:
- An
event_typeis present in a month only if at least one event of that type was recorded in that month. - Each
event_type'sprev_month_countis drawn solely from that sameevent_type's earlier history. - The earliest month in each
event_type's own history has no preceding month within that type; itsprev_month_countvalue is missing.
Output:
- One row per (
event_type,month) pair present in the data. - Columns in this order:
event_type,month(the first day of the calendar month),event_count,prev_month_count. - Sorted by
event_typeascending, thenmonthascending.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
event_type,
DATE_TRUNC('month', occurred_at)::date AS MONTH,
COUNT(*) AS event_count,
LAG(COUNT(*)) OVER (
PARTITION BY
event_type
ORDER BY
DATE_TRUNC('month', occurred_at)
) AS prev_month_count
FROM
events
GROUP BY
event_type,
DATE_TRUNC('month', occurred_at)
ORDER BY
event_type,
MONTH The shape
The series isn't a single timeline — it's one timeline per event_type, running in parallel. PARTITION BY event_type slices the window so the prior-month lookback for one type cannot reach across into a different type's history, and LAG resets to NULL at every type boundary.
Clause by clause
SELECT event_type, DATE_TRUNC('month', occurred_at)::date AS month, COUNT(*) AS event_count, LAG(COUNT(*)) OVER (PARTITION BY event_type ORDER BY DATE_TRUNC('month', occurred_at)) AS prev_month_countreturns each type's month bucket, that month's event count, and the same type's prior-month count.PARTITION BY event_typeisolates the lookback to a single type;ORDER BY DATE_TRUNC('month', occurred_at)sets the chronological sequence inside that type's partition.FROM eventsreads every recorded event.GROUP BY event_type, DATE_TRUNC('month', occurred_at)produces one row per (type, month) pair soCOUNT(*)becomes the per-type monthly total.ORDER BY event_type, monthprints each type's series as a contiguous block, in time order.
Why partitioning on the comparison entity, not on the date
The partition isolates rows that should be compared against each other; the order defines the sequence within that isolation. The comparison entity is the event_type — that's what each prior-month lookup should stay inside. The date is the order — it sets which row is "one back" within a type's partition. Reversing the two would partition by month and order by type, which makes no analytical sense: it would slice the data into one partition per month, with one row per type inside each partition, and the "prior" lookup would walk across alphabetically adjacent types instead of across time.
The trap
Dropping PARTITION BY event_type doesn't raise an error. LAG will still return a value for every row except the first, and the column will look populated. What it returns is the previous row in the global sequence, which means the first month of click will pull the last month of add_to_cart (or whichever type comes before it in the ordering) as its prior value. The numbers are real numbers, the report renders cleanly, and every type-to-type boundary is silently wrong. The bug has no surface symptom — it shows up only when a stakeholder checks an individual type's series against the database directly and notices the prior-month value doesn't match.
You practiced partitioning LAG by event_type so each type's prior-month count is drawn from its own series, with the lookback resetting at every type boundary.