Scenario: Brightlane's product performance team is identifying which product categories are generating above-average revenue from line items.
Task: Write a query to return each qualifying category_name and its revenue.
Assumptions:
- A line item's revenue is
quantitymultiplied byunit_price. - A category's
revenueis the combined line-item revenue across its products. - The result covers only categories whose
revenueis strictly greater than the averagerevenueacross every category in the company-wide line-item set.
Output:
- One row per qualifying category.
- Columns in this order:
category_name,revenue.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
category_revenue AS (
SELECT
p.category_id,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM
order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY
p.category_id
)
SELECT
c.name AS category_name,
cr.revenue
FROM
category_revenue cr
JOIN categories c ON c.id = cr.category_id
WHERE
cr.revenue > (
SELECT
AVG(revenue)
FROM
category_revenue
) The shape
The CTE category_revenue computes one row of revenue per category, and the outer query references that same CTE twice — once as the driver and once inside a scalar subquery that takes AVG(revenue) across every category. Each category's revenue is then compared against that average to decide whether it qualifies. Referencing the CTE twice is exactly the case where naming the intermediate pays off.
Clause by clause
WITH category_revenue AS (SELECT p.category_id, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category_id)joins line items to products and totals revenue per category. One row per category that has any line items across its products.FROM category_revenue cr JOIN categories c ON c.id = cr.category_iddrives the outer query off the aggregated CTE and joins tocategoriesfor the name.WHERE cr.revenue > (SELECT AVG(revenue) FROM category_revenue)is the load-bearing filter. The scalar subquery reads the company-wide average revenue once across every category in the CTE; PostgreSQL evaluates it once and compares each outer row against that single number.SELECT c.name AS category_name, cr.revenuereturns only the qualifying categories' names and totals.
Why the CTE and not two separate aggregations
If category_revenue were inlined twice — one full pass for the outer driver and a second full pass inside the average subquery — the aggregation would run twice over order_items JOIN products. The CTE pulls the aggregation up once and lets both the outer driver and the average reference the same materialized intermediate. The structure also makes the intent legible: "compute revenue per category, then keep the ones above their own average."
The trap
The average has to be AVG(revenue) over the CTE, not AVG over the raw line items. Averaging line-item revenue gives the average line value, which has nothing to do with the per-category average. The denominator of any "above average" filter has to be averaged at the same grain as the value being compared. The CTE form keeps both at category grain, so the comparison stays honest. A second trap to watch for is > versus >= — the prompt says strictly greater than the average, which excludes any category whose revenue exactly equals the average; >= would include it.
You practiced computing per-category revenue in a CTE, then referring back to that same CTE in a scalar subquery for the cross-category average — a shape that needs the named layer because the same intermediate is used twice.