Brightlane's catalog team is reviewing pricing for properly categorized products. Products awaiting classification should not influence the analysis.
Write a query to return the category ID and average price for every assigned category whose average product price exceeds $500.
Assumptions:
- Products awaiting classification have a missing
category_idand should not contribute to the analysis. - A category's average price is the average of every
pricevalue among products linked to thatcategory_id. - Only categories whose average price exceeds
$500should appear.
Output:
- One row per qualifying category, with columns
category_idandavg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
category_prices AS (
SELECT
category_id,
AVG(price) AS avg_price
FROM
products
WHERE
category_id IS NOT NULL
GROUP BY
category_id
)
SELECT
category_id,
avg_price
FROM
category_prices
WHERE
avg_price > 500 The shape
Two filters in two different places. The WHERE category_id IS NOT NULL lives inside the WITH layer and removes the uncategorized products before the grouping ever sees them. The WHERE avg_price > 500 lives in the main query and runs against the layer's aggregate output. Each filter belongs where the column it references is in scope.
Clause by clause
- The
WITHclause definescategory_prices:
WITH category_prices AS (
SELECT category_id, AVG(price) AS avg_price
FROM products
WHERE category_id IS NOT NULL
GROUP BY category_id
)WHERE category_id IS NOT NULL drops the products awaiting classification; only categorized products reach the grouping. GROUP BY category_id then partitions the survivors per category, and AVG(price) collapses each partition to one row. Categories 5, 6, and 7 end up with averages of 782.33, 1459, and 882.33 respectively.
SELECT category_id, avg_price FROM category_prices WHERE avg_price > 500is the main query. It reads the named layer and keeps only the rows whose average exceeds500. All three of the result-set categories clear the threshold; the rest were either filtered out of the layer or fall below500in the layer's output.
Why the IS NOT NULL filter goes inside the layer and not in the main query
category_id exists as a column on products but does not exist on the layer's output except as the grouping key. The uncategorized rows, the ones with a missing category_id, would collapse into a single NULL group inside the layer and produce a NULL-keyed average that the prompt explicitly excludes. Filtering them in the main query after the grouping is too late: by then they have already contaminated the aggregation. The fix is to remove them upstream, while category_id is still a row-level column and the filter can act on the raw rows.
The trap
The two thresholds look symmetric but are not. The source-level filter is a row condition on products and runs before grouping; the aggregate-level filter is a column condition on category_prices and runs after. Swapping them silently changes the result. Putting avg_price > 500 inside the layer fails because avg_price does not exist as a column until the grouping completes. Putting category_id IS NOT NULL in the main query fails to exclude the uncategorized group's contribution to the layer's aggregation in the first place. Filters that act on grouping inputs go inside the layer; filters that act on grouping outputs go in the main query.
You practiced a two-stage condition: pre-restrict source records inside the WITH layer with a WHERE, then apply a threshold against the layer's aggregate output in the main query.