Scenario: Brightlane's analytics platform produces a customer spend-trend report. The data team has decided that when a customer has no preceding-month history, the prior-month spend should be reported as zero rather than as a missing value, so downstream calculations can treat it as a known baseline of zero spend.
Task: Write a query to return each customer's customer_id, calendar month, total spend in that month, and total spend in the immediately preceding month within that same customer's own order history — with zero substituted in place of any boundary-case missing value.
Assumptions:
- A customer is present in a month only if they have at least one order placed in that month.
- Each customer's
prev_month_spendis drawn solely from that same customer's earlier order history. - The earliest month in each customer's own history has no preceding month within that customer; its
prev_month_spendmust be reported as0.00, not as a missing value.
Output:
- One row per (
customer_id,month) pair present in the data. - Columns in this order:
customer_id,month(the first day of the calendar month),monthly_spend,prev_month_spend. - Sorted by
customer_idascending, thenmonthascending.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
customer_id,
DATE_TRUNC('month', ordered_at)::date AS MONTH,
SUM(total_amount) AS monthly_spend,
LAG(SUM(total_amount), 1, 0.00) OVER (
PARTITION BY
customer_id
ORDER BY
DATE_TRUNC('month', ordered_at)
) AS prev_month_spend
FROM
orders
GROUP BY
customer_id,
DATE_TRUNC('month', ordered_at)
ORDER BY
customer_id,
MONTH The shape
LAG's third argument is the default value returned when the offset reaches outside the partition. LAG(SUM(total_amount), 1, 0.00) reads as "look one row back inside this customer's partition; if no such row exists, return 0.00 instead of NULL." The boundary case is handled by the window function itself, not by a downstream substitution.
Clause by clause
SELECT customer_id, DATE_TRUNC('month', ordered_at)::date AS month, SUM(total_amount) AS monthly_spend, LAG(SUM(total_amount), 1, 0.00) OVER (PARTITION BY customer_id ORDER BY DATE_TRUNC('month', ordered_at)) AS prev_month_spendreturns each customer's monthly spend and a guaranteed-numeric prior-month spend. The middle argument1is the offset; the third argument0.00is the default that fires when the offset reaches outside the partition.PARTITION BY customer_idkeeps the lookback inside one customer;ORDER BY DATE_TRUNC('month', ordered_at)orders the partition chronologically.FROM ordersreads every order in the table.GROUP BY customer_id, DATE_TRUNC('month', ordered_at)produces one row per (customer, month) pair.ORDER BY customer_id, monthprints each customer's history as a contiguous block in time order.
Why the third argument and not a separate substitution step
The three-argument form folds the boundary handling into the window function itself, so the result type is consistent and no downstream step has to handle NULL. The default is type-checked against SUM(total_amount) at parse time, so the prev_month_spend column is a numeric on every row — including each customer's first month. Doing the substitution after the fact with a CASE expression would arrive at the same values, but it would do so in a second step and would not be obvious from reading the SELECT list that the prev_month_spend column is non-null by construction.
The trap
The default value 0.00 is a presentational sentinel, not a real prior-month spend. The customer did not actually spend zero dollars in the month before their first order — there was no month before their first order at all. Any downstream calculation that aggregates prev_month_spend or treats it as a real prior measurement will count those zeros and skew the result. The right place for the default is in a report where the consumer needs a numeric column for display or for arithmetic that would otherwise propagate NULL. The wrong place is anywhere a SUM(prev_month_spend) or an average gets computed; there, the inserted zero silently distorts the answer in a way that looks like real data.
You practiced using LAG's default-value parameter to fill the boundary case with 0.00, so the first month in each customer's series carries a known baseline rather than a missing value.