Large Categories with Premium Pricing

The problem

Brightlane's catalog team wants to find premium, well-stocked categories — categories that carry 3 or more products and whose average product price exceeds the average across every well-stocked category.

Write a query to return the category ID and average price for every premium, well-stocked category.

Assumptions:

A category's product count is the number of products records linked to that category_id. A category's average price is the average of price across those records.
A well-stocked category has 3 or more products. The well-stocked group consists of every well-stocked category.
The well-stocked-group average is the average of the per-category averages across every well-stocked category.
Only well-stocked categories whose average price exceeds the well-stocked-group average should appear.

Output:

One row per qualifying category, with columns category_id and avg_price.

Schema · ecommerce 5 tables

categories

id integer

name text

parent_id? integer

products

id integer

name text

category_id integer

price numeric

stock_qty integer

attributes? jsonb

order_items

id integer

order_id integer

product_id integer

quantity integer

unit_price numeric

customers

id integer

name text

email text

city? text

country text

created_at timestamptz

is_active boolean

orders

id integer

customer_id integer

ordered_at timestamptz

status text

total_amount numeric

Check answerShift Ctrl ↵

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first

Solution query

WITH
  cat_stats AS (
    SELECT
      category_id,
      COUNT(*) AS product_count,
      AVG(price) AS avg_price
    FROM
      products
    GROUP BY
      category_id
  ),
  large_cats AS (
    SELECT
      category_id,
      product_count,
      avg_price
    FROM
      cat_stats
    WHERE
      product_count >= 3
  ),
  premium_large AS (
    SELECT
      category_id,
      avg_price
    FROM
      large_cats
    WHERE
      avg_price > (
        SELECT
          AVG(avg_price)
        FROM
          large_cats
      )
  )
SELECT
  category_id,
  avg_price
FROM
  premium_large

The shape

Three cascading layers. The first computes both per-category statistics, the second restricts to well-stocked categories, and the third compares each well-stocked category's average price against the average of those averages, computed over the restricted set. The third layer reads its source twice — row-by-row and through a scalar subquery — so the comparison value is the well-stocked-group average, not the cross-catalog average.

Clause by clause

The first CTE produces one statistics row per category:

WITH cat_stats AS (
  SELECT category_id, COUNT(*) AS product_count, AVG(price) AS avg_price
  FROM products
  GROUP BY category_id
)

GROUP BY category_id produces one row per category. COUNT(*) and AVG(price) are both computed in the same pass, since both are needed downstream: the count qualifies the category and the average is what the final comparison uses.

The second CTE keeps the well-stocked categories:

large_cats AS (
  SELECT category_id, product_count, avg_price
  FROM cat_stats
  WHERE product_count >= 3
)

WHERE product_count >= 3 drops the sparse categories. The SELECT list carries avg_price forward unchanged because the next layer needs it on both sides of the comparison.

The third CTE compares each remaining row's avg_price to the well-stocked-group average:

premium_large AS (
  SELECT category_id, avg_price
  FROM large_cats
  WHERE avg_price > (SELECT AVG(avg_price) FROM large_cats)
)

The scalar subquery (SELECT AVG(avg_price) FROM large_cats) reads the restricted layer and reduces it to a single number. WHERE avg_price > ... then compares each well-stocked category's average to that number. Same set on both sides of the comparison; both references see only well-stocked categories.

SELECT category_id, avg_price FROM premium_large returns the three premium well-stocked categories: 5, 6, and 7.

Why the scalar subquery reads `large_cats` and not `cat_stats`

The well-stocked-group average is the average of the well-stocked categories' averages, not the average of every category's average. Writing (SELECT AVG(avg_price) FROM cat_stats) would pull the sparse categories back into the comparison value, and a high-priced sparse category would shift the threshold. Reading from large_cats on both sides keeps the comparison anchored to the restricted set the prompt actually describes.

The trap

The restriction "well-stocked" lives in the second CTE, and it has to be respected by both references in the third. The row-by-row read FROM large_cats makes that automatic on the left side of the comparison, but the scalar subquery has to spell it out on the right by also reading from large_cats. Letting the scalar subquery slip back to cat_stats silently changes the threshold the surviving rows are measured against, and the change is invisible in the output until a borderline category flips one way or the other.

You practiced cascading three WITH stages where the second applies a count threshold and the third compares each remaining row's value against an aggregate of that same restricted set.

Return the category ID and average price for every premium, well-stocked category

The shape

Clause by clause

Why the scalar subquery reads `large_cats` and not `cat_stats`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why the scalar subquery reads large_cats and not cat_stats

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why the scalar subquery reads `large_cats` and not `cat_stats`