Brightlane's buying team is evaluating premium-category positioning and needs to identify which product groupings command a higher average price point.
Write a query to return the category ID and average list price for every category whose mean price exceeds $100.
Assumptions:
- The
productstable contains every product in Brightlane's catalogue. - The average is taken across the
priceof every product in the category. - Categories whose mean price is exactly
$100do not qualify; only those above qualify.
Output:
- One row per qualifying
category_id, with columnscategory_idandavg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
SELECT
category_id,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
HAVING
AVG(price) > 100 The shape
GROUP BY category_id builds the per-category set of products; AVG(price) computes each category's mean list price; HAVING AVG(price) > 100 keeps only the categories above $100. The result is six surviving categories, ranging from category_id 6 at 1459 down to category_id 11 at 182.12. Categories whose mean price is at or below $100, or exactly $100, fall out — the comparison is strict.
Clause by clause
SELECT category_id, AVG(price) AS avg_pricereturns the grouping column with the per-category mean.AVGwalks thepricevalues for every product in the category and returns the average.FROM productsis the source set: every product in Brightlane's catalogue.GROUP BY category_idpartitions the catalogue into one group per category. After this clause, each row in the working set represents one category with its average price attached.HAVING AVG(price) > 100filters those category rows by the mean. The aggregate expression is repeated rather than referenced by theavg_pricealias, because aliases are not in scope atHAVING-time.
Why this and not WHERE price > 100
This is the cleanest illustration of the HAVING vs WHERE split. WHERE price > 100 keeps only the individual products priced above $100 and averages those. A category with three products at $90, $110, and $130 would report an average of $120, because the $90 product is excluded before the average runs. HAVING AVG(price) > 100 keeps every product, computes the honest mean across all of them, and only then checks the threshold. That same category correctly averages $110.
Which form is correct depends on the question. "Categories whose mean product price exceeds $100" is the per-category metric, which needs HAVING. "The average price of products that cost more than $100" is the per-row filter, which needs WHERE. The two read similarly in English; in SQL they produce different numbers.
The trap
The trap is that both queries run cleanly and both return a result that looks like "average prices by category." There is no error to flag the wrong choice. The WHERE form silently changes which rows feed into the average, and the answer comes back as a plausible-looking number that is off by however much the excluded rows would have pulled the mean.
The rule: a threshold on an aggregate goes in HAVING. A threshold on the raw inputs to that aggregate goes in WHERE.
You practiced filtering categorical groupings by an aggregate threshold. The shape generalises to any "which segments meet this metric bar" question — the segment is the GROUP BY column, the metric is the aggregate, the bar is the HAVING comparison.