Brightlane's pricing pipeline materializes category-level metrics into a temp table for categories with 3 or more products on record. For each qualifying category the temp table carries the product count, the combined price across the category, and the average product price.
Write a query to return each qualifying category's ID, product count, combined price, and average product price.
Assumptions:
- The
productstable has one row per product with acategory_idand aprice. - A category's product count is the number of products in that
category_id. A category's combined price is the sum ofpriceacross those products. A category's average product price is the combined price divided by the product count. - Only categories whose product count is greater than
2should appear.
Output:
- One row per qualifying category, with columns
category_id,product_count,total_price, andavg_price.
Schema · ecommerce 5 tables
Run previews · Check grades
Write a query, then run it to see results here.
Worked solution Try it yourself first
WITH
cat_totals AS (
SELECT
category_id,
COUNT(*) AS product_count,
SUM(price) AS total_price
FROM
products
GROUP BY
category_id
)
SELECT
category_id,
product_count,
total_price,
total_price / product_count AS avg_price
FROM
cat_totals
WHERE
product_count > 2 The shape
The CTE computes the two raw aggregates that depend on the underlying rows — the product count and the combined price per category. The outer SELECT then derives the average by dividing those two numbers, and applies the count threshold. Doing the division in the outer layer rather than inside the CTE means the materialized table can carry the components, not only the derived metric, which is what downstream reports comparing categories on either total spend or average price will want.
Clause by clause
WITH cat_totals AS (SELECT category_id, COUNT(*) AS product_count, SUM(price) AS total_price FROM products GROUP BY category_id)groupsproductsbycategory_idand computes the per-category count and total price. The result is one row per category with both raw aggregates as named columns.SELECT category_id, product_count, total_price, total_price / product_count AS avg_price FROM cat_totalsreads the CTE and adds a fourth column. The average price is computed by dividing the two aggregates that already exist on each row. So category 8'stotal_priceof 364.95 divided byproduct_countof 5 gives theavg_priceof 72.99.WHERE product_count > 2filters the result to qualifying categories. The threshold compares againstproduct_count, which is a real column on the CTE result, so the comparison runs at the row level after aggregation.
Why derive the average from the raw components and not call AVG(price)
AVG(price) and SUM(price) / COUNT(*) agree on this data because every product has a recorded price, but they are not the same expression. AVG skips rows where price is NULL; the explicit SUM(price) / COUNT(*) divides the sum-over-non-null-prices by the full row count. When the materialized table needs to expose both the components and the derived metric so reports can recompute either separately, computing them as two distinct aggregates is the right shape.
The trap
The category 8 example divides 364.95 by 5 and lands on 72.99 cleanly because both operands are decimals. Watch what happens when both operands are integers: in PostgreSQL, integer division truncates, so COUNT(*) divided into another COUNT(*) would silently drop the fractional part. Here SUM(price) is numeric because price is numeric, so the division promotes to numeric automatically and no truncation happens. If you ever derive a ratio from two COUNT(*) aggregates, one side has to carry a decimal point or the result loses precision before it ever reaches the outer SELECT.
You practiced computing two raw aggregates first (count and sum) and deriving the average from them by division in the outer SELECT — the right shape when the materialized table needs the underlying components rather than just the derived metric.