CTE Revenue Joined to Category Names — Subqueries vs CTEs vs Joins in SQL

The problem

Scenario: Brightlane's product performance team is identifying which product categories are generating above-average revenue from line items.

Task: Write a query to return each qualifying category_name and its revenue.

Assumptions:

A line item's revenue is quantity multiplied by unit_price.
A category's revenue is the combined line-item revenue across its products.
The result covers only categories whose revenue is strictly greater than the average revenue across every category in the company-wide line-item set.

Output:

One row per qualifying category.
Columns in this order: category_name, revenue.

Schema · ecommerce 5 tables

categories

id integer

name text

parent_id? integer

products

id integer

name text

category_id integer

price numeric

stock_qty integer

attributes? jsonb

order_items

id integer

order_id integer

product_id integer

quantity integer

unit_price numeric

customers

id integer

name text

email text

city? text

country text

created_at timestamptz

is_active boolean

orders

id integer

customer_id integer

ordered_at timestamptz

status text

total_amount numeric

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  category_revenue AS (
    SELECT
      p.category_id,
      SUM(oi.quantity * oi.unit_price) AS revenue
    FROM
      order_items oi
      JOIN products p ON oi.product_id = p.id
    GROUP BY
      p.category_id
  )
SELECT
  c.name AS category_name,
  cr.revenue
FROM
  category_revenue cr
  JOIN categories c ON c.id = cr.category_id
WHERE
  cr.revenue > (
    SELECT
      AVG(revenue)
    FROM
      category_revenue
  )

The shape

The CTE category_revenue computes one row of revenue per category, and the outer query references that same CTE twice — once as the driver and once inside a scalar subquery that takes AVG(revenue) across every category. Each category's revenue is then compared against that average to decide whether it qualifies. Referencing the CTE twice is exactly the case where naming the intermediate pays off.

Clause by clause

WITH category_revenue AS (SELECT p.category_id, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category_id) joins line items to products and totals revenue per category. One row per category that has any line items across its products.
FROM category_revenue cr JOIN categories c ON c.id = cr.category_id drives the outer query off the aggregated CTE and joins to categories for the name.
WHERE cr.revenue > (SELECT AVG(revenue) FROM category_revenue) is the load-bearing filter. The scalar subquery reads the company-wide average revenue once across every category in the CTE; PostgreSQL evaluates it once and compares each outer row against that single number.
SELECT c.name AS category_name, cr.revenue returns only the qualifying categories' names and totals.

Why the CTE and not two separate aggregations

If category_revenue were inlined twice — one full pass for the outer driver and a second full pass inside the average subquery — the aggregation would run twice over order_items JOIN products. The CTE pulls the aggregation up once and lets both the outer driver and the average reference the same materialized intermediate. The structure also makes the intent legible: "compute revenue per category, then keep the ones above their own average."

The trap

The average has to be AVG(revenue) over the CTE, not AVG over the raw line items. Averaging line-item revenue gives the average line value, which has nothing to do with the per-category average. The denominator of any "above average" filter has to be averaged at the same grain as the value being compared. The CTE form keeps both at category grain, so the comparison stays honest. A second trap to watch for is > versus >= — the prompt says strictly greater than the average, which excludes any category whose revenue exactly equals the average; >= would include it.

You practiced computing per-category revenue in a CTE, then referring back to that same CTE in a scalar subquery for the cross-category average — a shape that needs the named layer because the same intermediate is used twice.

Return each qualifying `category_name` and its `revenue`

The shape

Clause by clause

Why the CTE and not two separate aggregations

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.