N030-H2 Tier 3 · Intermediate · hard ecommerce · Brightlane

Return the category ID and average price for every assigned category whose average product price exceeds `$500`

Part of Common Table Expressions (CTEs) in SQL

The problem

Brightlane's catalog team is reviewing pricing for properly categorized products. Products awaiting classification should not influence the analysis.

Write a query to return the category ID and average price for every assigned category whose average product price exceeds $500.

Assumptions:

  • Products awaiting classification have a missing category_id and should not contribute to the analysis.
  • A category's average price is the average of every price value among products linked to that category_id.
  • Only categories whose average price exceeds $500 should appear.

Output:

  • One row per qualifying category, with columns category_id and avg_price.
Schema · ecommerce 5 tables
categories
id integer
name text
parent_id? integer
products
id integer
name text
category_id integer
price numeric
stock_qty integer
attributes? jsonb
order_items
id integer
order_id integer
product_id integer
quantity integer
unit_price numeric
customers
id integer
name text
email text
city? text
country text
created_at timestamptz
is_active boolean
orders
id integer
customer_id integer
ordered_at timestamptz
status text
total_amount numeric

Run previews · Check grades

Write a query, then run it to see results here.

Worked solution Try it yourself first
Solution query
WITH
  category_prices AS (
    SELECT
      category_id,
      AVG(price) AS avg_price
    FROM
      products
    WHERE
      category_id IS NOT NULL
    GROUP BY
      category_id
  )
SELECT
  category_id,
  avg_price
FROM
  category_prices
WHERE
  avg_price > 500

The shape

Two filters in two different places. The WHERE category_id IS NOT NULL lives inside the WITH layer and removes the uncategorized products before the grouping ever sees them. The WHERE avg_price > 500 lives in the main query and runs against the layer's aggregate output. Each filter belongs where the column it references is in scope.

Clause by clause

  • The WITH clause defines category_prices:
WITH category_prices AS (
  SELECT category_id, AVG(price) AS avg_price
  FROM products
  WHERE category_id IS NOT NULL
  GROUP BY category_id
)

WHERE category_id IS NOT NULL drops the products awaiting classification; only categorized products reach the grouping. GROUP BY category_id then partitions the survivors per category, and AVG(price) collapses each partition to one row. Categories 5, 6, and 7 end up with averages of 782.33, 1459, and 882.33 respectively.

  • SELECT category_id, avg_price FROM category_prices WHERE avg_price > 500 is the main query. It reads the named layer and keeps only the rows whose average exceeds 500. All three of the result-set categories clear the threshold; the rest were either filtered out of the layer or fall below 500 in the layer's output.

Why the IS NOT NULL filter goes inside the layer and not in the main query

category_id exists as a column on products but does not exist on the layer's output except as the grouping key. The uncategorized rows, the ones with a missing category_id, would collapse into a single NULL group inside the layer and produce a NULL-keyed average that the prompt explicitly excludes. Filtering them in the main query after the grouping is too late: by then they have already contaminated the aggregation. The fix is to remove them upstream, while category_id is still a row-level column and the filter can act on the raw rows.

The trap

The two thresholds look symmetric but are not. The source-level filter is a row condition on products and runs before grouping; the aggregate-level filter is a column condition on category_prices and runs after. Swapping them silently changes the result. Putting avg_price > 500 inside the layer fails because avg_price does not exist as a column until the grouping completes. Putting category_id IS NOT NULL in the main query fails to exclude the uncategorized group's contribution to the layer's aggregation in the first place. Filters that act on grouping inputs go inside the layer; filters that act on grouping outputs go in the main query.

You practiced a two-stage condition: pre-restrict source records inside the WITH layer with a WHERE, then apply a threshold against the layer's aggregate output in the main query.

How you actually get good at SQL

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

That's the whole SQLMaxx loop: 600+ real problems, instant AI feedback, mastery you can actually see, and spaced review that won't let you forget.

A stack of SQL practice problem cards, the top card showing an employees table.
615 problems · 66 concepts

Real problems. Not toy examples.

615 hand-built problems spanning all 66 concepts, from basic SELECTs to window functions, built on real schemas and real business questions, the kind you'll actually get asked on the job. Enough reps to make SQL automatic.

A retro computer showing a SQL query marked correct with a green checkmark.
Instant AI feedback

Write a query. Know if it's right in one second.

No copying an answer and hoping it clicked. The AI grader checks your real query against real data, catches exactly what's wrong, and explains the fix in plain English, like a senior analyst reading over your shoulder on every problem.

A circular mastery progress dial filling from blue to green, the SQLMaxx diamond at its center.
Mastery tracking

Stop guessing whether you actually know it.

SQLMaxx tracks every concept and shows you what you've mastered and what's still shaky. Your skills fill in one concept at a time, so 'I think I get joins' becomes something you can prove.

A SQL query editor circled by a blue return arrow with a clock, scheduled to come back for review.
Spaced review

Learn it once. Keep it for good.

Most of what you learn this week fades by next week. So when a concept comes due for review, SQLMaxx hands you a fresh problem to solve from a blank editor, not a flashcard to re-read. A research-backed spaced-repetition algorithm (FSRS) times each return for right before you'd forget, so your SQL is still there months later, when the interview or the job actually needs it.

Practice, feedback, mastery, review. That's the loop that turns reading into real skill.

Start free

No account, no credit card. Start solving in under a minute.