Scenario: Brightlane's strategy team is investigating whether monthly order volume follows a two-month cycle, so each month must be compared to the count exactly two months earlier.
Task: Write a query to return each calendar month, the number of orders placed in that month, and the number of orders placed in the calendar month two months earlier.
Assumptions:
- A calendar month is identified by its first day and covers every order placed within that month.
- The two earliest months in the data have no calendar month two months earlier within the data; their
count_two_months_agovalue is missing.
Output:
- One row per calendar month present in the data.
- Columns in this order:
month(the first day of the calendar month),order_count,count_two_months_ago. - Sorted by
monthascending.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
DATE_TRUNC('month', ordered_at)::date AS MONTH,
COUNT(*) AS order_count,
LAG(COUNT(*), 2) OVER (
ORDER BY
DATE_TRUNC('month', ordered_at)
) AS count_two_months_ago
FROM
orders
GROUP BY
DATE_TRUNC('month', ordered_at)
ORDER BY
MONTH The shape
LAG's second argument is the offset. LAG(COUNT(*), 2) reaches two rows back in the ordered sequence instead of one, so each month sits next to the count from two months earlier. The pattern that compares "every other month" is just a different number in the same call.
Clause by clause
SELECT DATE_TRUNC('month', ordered_at)::date AS month, COUNT(*) AS order_count, LAG(COUNT(*), 2) OVER (ORDER BY DATE_TRUNC('month', ordered_at)) AS count_two_months_agoreturns the month bucket, that month's order count, and the count from exactly two months earlier. The2is the offset argument; it tellsLAGhow many rows back to reach, not how many months. The two match here only because the rows are at monthly grain.FROM ordersreads every order.GROUP BY DATE_TRUNC('month', ordered_at)collapses the rows to one per calendar month.ORDER BY monthprints the months in calendar order.
Why the offset and not a self-join on a shifted date
A self-join on month = month - INTERVAL '2 months' would arrive at the same answer if every calendar month happened to be present in the data. The offset form sidesteps a different question, though: it counts rows, not months. If a month had zero orders, the row for that month would not exist in the grouped output, and a date-arithmetic join would still match against the missing month and produce NULL. The offset form reaches back the same way regardless — two rows back is whatever the second-prior row in the grouped sequence happens to be. Both behaviors can be right; they are not the same behavior, and the question being asked decides which one fits.
The trap
The first two rows of the result have no row two positions back, so count_two_months_ago is NULL for both of them. That is one more NULL row than the default-offset form produces, and it scales with the offset: LAG(..., 12) for a year-over-year comparison produces twelve NULL rows at the start of the series. The boundary doesn't fail loudly — it just shows up as NULL — and a report that doesn't filter or handle those rows will quietly carry them. The fix when the consumer needs a real number is the third argument: LAG(COUNT(*), 2, 0) substitutes 0 for the boundary case. The fix when the consumer wants the boundary excluded is to filter those rows out of the final result.
You practiced calling LAG with an offset of two so the lookback skips the immediately preceding month and lands on the month two periods earlier.