Scenario: Streamhub's product team wants to identify seasonal patterns by looking at which months of the year draw the highest event volume, ignoring which year each event happened in.
Task: Write a query to return each month-of-year and the total number of events recorded in that month-of-year across every year in the data.
Assumptions:
- The
month_numvalue is the calendar month of the event, expressed as a number from 1 (January) through 12 (December), with no year component. - An event from January 2023 and an event from January 2024 contribute to the same month-of-year row.
Output:
- One row per month-of-year present in the data, between 1 and 12 inclusive.
- Columns in this order:
month_num,event_count.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
EXTRACT(
MONTH
FROM
occurred_at
) AS month_num,
COUNT(*) AS event_count
FROM
events
GROUP BY
EXTRACT(
MONTH
FROM
occurred_at
) The shape
EXTRACT(month FROM occurred_at) returns the month number alone — a value from 1 to 12 with no year attached. Grouping on that single number collapses January 2022, January 2023, and January 2024 events all into the same row, which is exactly the seasonal view the product team wants.
Clause by clause
SELECT EXTRACT(month FROM occurred_at) AS month_num, COUNT(*) AS event_countreturns one row per distinct month number. The output is twelve numeric rows at most, one per calendar month of the year, with the total event volume across every year in the data.FROM eventsreads every recorded event.GROUP BY EXTRACT(month FROM occurred_at)uses the same extraction as the grouping key. Every event whose timestamp falls in March, regardless of year, extracts to the number 3 and lands in the same group. The count of 32 on month 3 is the total of every March event in the table.
Why EXTRACT and not date_trunc
date_trunc('month', occurred_at) would return one row per calendar month per year, separating March 2022 from March 2023 from March 2024. That is the right shape for a time-series report but the wrong shape here. The product team is looking for a seasonal pattern that ignores year entirely, and EXTRACT(month FROM ...) is the only construct in this node that strips the year off and returns just the month number. The difference between the two functions is exactly the difference between a calendar grouping and a seasonal one: date_trunc preserves the year axis, EXTRACT(month FROM ...) collapses it.
The trap
The result has at most twelve rows, and the natural reading is to scan them in January-through-December order. The query does not sort them, and GROUP BY alone does not guarantee any output order. The rows can come back in any sequence, which is fine for a seasonal pivot the team will plot or join to a month-name table downstream, but a reader expecting the numbers to land in calendar order will read the values out of sequence. The values themselves are correct; only the row order is undefined.
You practiced extracting a month-of-year value with no year component so events from different years collapse together by season rather than by date.