Category Totals with Derived Average — CREATE TABLE AS in SQL

The problem

Brightlane's pricing pipeline materializes category-level metrics into a temp table for categories with 3 or more products on record. For each qualifying category the temp table carries the product count, the combined price across the category, and the average product price.

Write a query to return each qualifying category's ID, product count, combined price, and average product price.

Assumptions:

The products table has one row per product with a category_id and a price.
A category's product count is the number of products in that category_id. A category's combined price is the sum of price across those products. A category's average product price is the combined price divided by the product count.
Only categories whose product count is greater than 2 should appear.

Output:

One row per qualifying category, with columns category_id, product_count, total_price, and avg_price.

Schema · ecommerce 5 tables

The shape

The CTE computes the two raw aggregates that depend on the underlying rows — the product count and the combined price per category. The outer SELECT then derives the average by dividing those two numbers, and applies the count threshold. Doing the division in the outer layer rather than inside the CTE means the materialized table can carry the components, not only the derived metric, which is what downstream reports comparing categories on either total spend or average price will want.

Clause by clause

WITH cat_totals AS (SELECT category_id, COUNT(*) AS product_count, SUM(price) AS total_price FROM products GROUP BY category_id) groups products by category_id and computes the per-category count and total price. The result is one row per category with both raw aggregates as named columns.
SELECT category_id, product_count, total_price, total_price / product_count AS avg_price FROM cat_totals reads the CTE and adds a fourth column. The average price is computed by dividing the two aggregates that already exist on each row. So category 8's total_price of 364.95 divided by product_count of 5 gives the avg_price of 72.99.
WHERE product_count > 2 filters the result to qualifying categories. The threshold compares against product_count, which is a real column on the CTE result, so the comparison runs at the row level after aggregation.

Why derive the average from the raw components and not call `AVG(price)`

AVG(price) and SUM(price) / COUNT(*) agree on this data because every product has a recorded price, but they are not the same expression. AVG skips rows where price is NULL; the explicit SUM(price) / COUNT(*) divides the sum-over-non-null-prices by the full row count. When the materialized table needs to expose both the components and the derived metric so reports can recompute either separately, computing them as two distinct aggregates is the right shape.

The trap

The category 8 example divides 364.95 by 5 and lands on 72.99 cleanly because both operands are decimals. Watch what happens when both operands are integers: in PostgreSQL, integer division truncates, so COUNT(*) divided into another COUNT(*) would silently drop the fractional part. Here SUM(price) is numeric because price is numeric, so the division promotes to numeric automatically and no truncation happens. If you ever derive a ratio from two COUNT(*) aggregates, one side has to carry a decimal point or the result loses precision before it ever reaches the outer SELECT.

You practiced computing two raw aggregates first (count and sum) and deriving the average from them by division in the outer SELECT — the right shape when the materialized table needs the underlying components rather than just the derived metric.

Return each qualifying category's ID, product count, combined price, and average product price

The shape

Clause by clause

Why derive the average from the raw components and not call `AVG(price)`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why derive the average from the raw components and not call AVG(price)

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why derive the average from the raw components and not call `AVG(price)`