Scenario: Streamhub's data team is producing a report where year and month-of-year are reported as two separate numeric columns rather than a single date, so analysts can slice by either dimension independently.
Task: Write a query to return the year, the month number from 1 through 12, and the total number of events for each year-month combination.
Assumptions:
- The
yearvalue is the four-digit calendar year of the event. - The
month_numvalue is the calendar month of the event, expressed as a number from 1 (January) through 12 (December). - One result row covers every event whose calendar year and calendar month match.
Output:
- One row per (
year,month_num) combination present in the data. - Columns in this order:
year,month_num,event_count.
Schema · analytics 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
EXTRACT(
YEAR
FROM
occurred_at
) AS YEAR,
EXTRACT(
MONTH
FROM
occurred_at
) AS month_num,
COUNT(*) AS event_count
FROM
events
GROUP BY
EXTRACT(
YEAR
FROM
occurred_at
),
EXTRACT(
MONTH
FROM
occurred_at
) The shape
EXTRACT pulls the year and the month-of-year out of occurred_at as two independent numeric values, and the grouping happens on the pair. The query reports time as two columns of numbers, not as a single truncated date, so analysts can slice on either dimension on its own.
Clause by clause
SELECT EXTRACT(year FROM occurred_at) AS year, EXTRACT(month FROM occurred_at) AS month_num, COUNT(*) AS event_countreturns one row per (year, month) pair. EachEXTRACTreturns a number, not a date, so the report shows2023and12instead of a December 2023 date.COUNT(*)counts the events that fall inside the pair.FROM eventsreads every event.GROUP BY EXTRACT(year FROM occurred_at), EXTRACT(month FROM occurred_at)uses the same two expressions as the grouping keys. The grouping is on the pair, not on each value independently, so December 2022 and December 2023 are kept apart because theiryearvalues differ even though theirmonth_nummatches.
Why EXTRACT and not date_trunc
date_trunc('month', occurred_at) would return a single timestamp per month, which is the right shape for time-series charts but the wrong shape here. The prompt asks for year and month-of-year as two separate numeric columns. EXTRACT returns a number for each part, which is exactly that shape. The two functions live in the same node for this reason: date_trunc is for grouping along a continuous calendar axis, and EXTRACT is for pulling calendar parts out as numbers that analysts can pivot on independently.
You practiced extracting calendar parts as separate numeric values so year and month-of-year can be reported as independent dimensions instead of a single truncated date.