The SQL roadmap

Learn SQL, concept by concept.

Sixty-six concepts in prerequisite order. Read the lecture, run every example against a real Postgres database in your browser, then practice. No signup to start.

66 concepts 615 practice problems 5 tiers

Start free → Browse concepts

In order

Concepts follow a prerequisite graph, so each one builds on what came before. You never hit something you are not ready for.

Run it free

Every example and problem runs against a real Postgres database in your browser. Nothing to install, no signup to start.

Mastery, measured

An adaptive engine tracks your Mastery Confidence on each concept and brings problems back for review before you forget.

All 66 concepts

Grouped by tier, in the order they build.

Tier 1 · Foundations

12 concepts · 108 problems

The shape of a query: choose columns, filter rows, sort, and limit what comes back.

N001

SELECT and Column Expressions

The SELECT clause controls what columns and expressions a query returns. Its scope is the shape of the output — which columns appear and what they contain — with no influence over which rows are included.

N002

FROM and Table References

The FROM clause identifies the row source a query operates on. It establishes the full population of rows that all subsequent clauses filter, aggregate, and shape.

N003

WHERE Clause and Comparison Operators

The WHERE clause filters the rows produced by FROM, keeping only those for which a condition evaluates to true. Its job is deciding which rows get processed at all.

N004

Literal Values, Data Types, and Type Casting

Every value in a PostgreSQL query has a type, and PostgreSQL uses that type to determine what operations are valid and how expressions are evaluated. Literal values and type casting are the mechanisms by which you introduce and convert typed values within a query.

N005

NULL Semantics and IS NULL

NULL represents the absence of a known value. Every NULL-related behavior in PostgreSQL follows from that single fact.

N006

Boolean Logic in WHERE (AND, OR, NOT)

The WHERE clause accepts a single boolean expression, and AND, OR, and NOT are the operators for combining multiple conditions into one. Their behavior follows three-valued logic, which means NULL in any condition affects how the combined expression evaluates.

N007

ORDER BY and Result Sorting

ORDER BY controls the sequence of rows in a query's output. Without it, PostgreSQL makes no guarantee about the order in which rows are returned.

N008

LIMIT and OFFSET

LIMIT restricts the number of rows a query returns. OFFSET skips a specified number of rows before returning the rest. Both operate on the final sorted result set.

N009

Column Aliases and Expression Naming

A column alias assigns a name to an expression in the SELECT list. That name appears as the column header in the result set and can be referenced by name in specific parts of the same query.

N010

DISTINCT

DISTINCT removes duplicate rows from a query's output. It operates on the complete set of output columns, treating two rows as duplicates only when every column value matches.

N011

Arithmetic and Comparison Expressions

Arithmetic and comparison expressions are the building blocks of computed values and conditions in SQL. Their behavior is governed entirely by the types of their operands.

N012

BETWEEN, IN, and LIKE

BETWEEN, IN, and LIKE are shorthand filter operators that expand on the basic comparison operators available in WHERE. Each tests a column value against a range, a set, or a pattern.

Tier 2 · Core SQL

15 concepts · 138 problems

Aggregation, grouping, joins, and your first subqueries. The everyday analyst toolkit.

N013

Aggregate Functions (COUNT, SUM, AVG, MIN, MAX)

Aggregate functions reduce a set of rows to a single computed value. COUNT, SUM, AVG, MIN, and MAX each collapse the row set differently, and each has specific behavior around NULLs, type coercion, and what counts as input.

N014

GROUP BY

GROUP BY partitions a query's rows into groups and directs aggregate functions to operate on each group independently. Without it, an aggregate function collapses the entire row set into one value. With it, the same function produces one value per group.

N015

HAVING

The HAVING clause filters groups after aggregation has occurred. It operates on the output of GROUP BY, applying conditions to aggregate values that do not exist until grouping is complete.

N016

CASE WHEN Expressions

The CASE WHEN expression evaluates a series of conditions and returns a value for the first condition that is true. It can appear anywhere in a query where a value expression is valid: in SELECT, WHERE, ORDER BY, and inside aggregate functions.

N017

INNER JOIN

INNER JOIN combines rows from two tables by matching them on a condition, returning one output row for every pair of rows — one from each table — where that condition is true. Rows from either table that have no match in the other are excluded from the result.

N018

LEFT JOIN and RIGHT JOIN

LEFT JOIN returns all rows from the left table, paired with matching rows from the right table where the join condition is true, and fills unmatched positions with NULL. RIGHT JOIN does the same in the opposite direction, preserving all rows from the right table.

N019

FULL OUTER JOIN

FULL OUTER JOIN returns all rows from both tables, matching them where the join condition is true and filling unmatched positions on either side with NULL. It combines the preservation behavior of LEFT JOIN and RIGHT JOIN into a single operation.

N020

CROSS JOIN

CROSS JOIN combines every row from one table with every row from another, producing a result with no join condition. The output contains one row for every possible pairing between the two tables.

N021

Self-Joins

A self-join joins a table to itself, treating two instances of the same table as distinct participants in the join. It follows the same mechanics as any other join; the only requirement is that both instances carry separate aliases so PostgreSQL can distinguish them.

N022

Joining Multiple Tables

Joining multiple tables extends the two-table join pattern by chaining additional JOIN clauses, each introducing a new table into the working result set. The result of each join becomes the input for the next.

N023

UNION, UNION ALL, INTERSECT, EXCEPT

Set operations combine the results of two or more SELECT statements into a single result set. Each operation has distinct rules for which rows survive the combination.

N024

Scalar Subqueries

A scalar subquery is a SELECT statement embedded inside another query that returns exactly one column and one row, producing a single value. That value can be used anywhere a literal or column expression is valid.

N025

Subqueries in WHERE (IN, EXISTS, ANY, ALL)

`IN`, `EXISTS`, `ANY`, and `ALL` extend WHERE filtering to test a row's values against the results of a subquery. Each operator expresses a different membership or comparison relationship between the outer row and the subquery's output.

N026

Derived Tables (Subqueries in FROM)

A subquery in the FROM clause produces a named result set that the outer query treats as a table. The inner query executes completely before the outer query begins, making the inner query's output the sole source the outer query works from.

N027

Conditional Aggregation (CASE inside Aggregates)

Conditional aggregation places a CASE WHEN expression inside an aggregate function, letting a single GROUP BY query compute separate aggregate values for different subsets of rows simultaneously.

Tier 3 · Intermediate

14 concepts · 128 problems

CTEs, date and string work, conditional logic, and an introduction to window functions.

N028

COALESCE and NULLIF

`COALESCE` and `NULLIF` are two scalar functions that convert between NULL and non-NULL values. `COALESCE` replaces NULL with a fallback; `NULLIF` produces NULL when a condition is met.

N029

NULL Handling in Joins and Aggregates

Joins and aggregate functions each interact with NULL in ways that are consistent with NULL semantics but non-obvious in combination. The failures they produce are silent: the query runs, returns results, and the results are wrong.

N030

Common Table Expressions (CTEs)

A Common Table Expression (CTE) defines a named subquery at the top of a statement that the rest of the query can reference by name. Its scope is a single query execution: the name exists only within the statement that defines it.

N031

Chained CTEs

A chained CTE query defines multiple named subqueries in a single `WITH` clause, each available to the main query and to CTEs that follow it. The chain is a sequence of named intermediate layers, resolved in the order they are written.

N032

Date and Time Types in PostgreSQL

PostgreSQL provides four primary types for representing points in time and durations: `DATE`, `TIME`, `TIMESTAMP`, and `TIMESTAMPTZ`. The choice between them determines what gets stored, how PostgreSQL interprets the value, and how comparisons between values behave.

N033

Date Truncation and Extraction

`DATE_TRUNC` and `EXTRACT` are the two primary functions for working with date/time values at a specific calendar granularity. `DATE_TRUNC` rounds a datetime down to a period boundary and returns a datetime. `EXTRACT` pulls a single numeric component out of a datetime and returns a number.

N034

Date Arithmetic and Intervals

PostgreSQL supports arithmetic between date/time values and a dedicated duration type, `INTERVAL`, which represents a span of time rather than a point in it. The result type of a date/time arithmetic expression depends on the types of both operands.

N035

String Functions (LENGTH, UPPER, LOWER, TRIM, SUBSTRING)

`LENGTH`, `UPPER`, `LOWER`, `TRIM`, and `SUBSTRING` are scalar string functions that inspect or transform a single text value and return a single result. Each operates on one row at a time, producing one output value per input row.

N036

String Concatenation and Formatting

PostgreSQL provides three tools for combining and formatting strings: the `||` operator, the `CONCAT` and `CONCAT_WS` functions, and the `FORMAT` function. They differ in how they handle NULL inputs and in what kind of output they are designed to produce.

N037

Pattern Matching (LIKE, ILIKE, SIMILAR TO, Regex)

PostgreSQL provides four pattern matching mechanisms for string columns: `LIKE`, `ILIKE`, `SIMILAR TO`, and the POSIX regex operators. Each uses a different pattern language with different expressive power and different performance characteristics.

N038

Window Functions Introduction (OVER, PARTITION BY)

A window function computes a value for each row using a set of related rows, without collapsing those rows into a single output row. The `OVER` clause defines which rows are related to each other and in what order.

N039

ROW_NUMBER, RANK, DENSE_RANK

`ROW_NUMBER`, `RANK`, and `DENSE_RANK` are window functions that assign an integer position to each row within a partition based on an `ORDER BY` clause. They produce different values only when two or more rows are tied on the ordering expression.

N040

Aggregate Window Functions (SUM, AVG, COUNT OVER)

The aggregate functions `SUM`, `AVG`, `COUNT`, `MIN`, and `MAX` can be used as window functions by attaching an `OVER` clause. Without `ORDER BY` inside `OVER`, they compute a static value across the entire partition. With `ORDER BY` inside `OVER`, they compute a running value that accumulates as it moves through the partition.

N041

Temp Tables and CREATE TABLE AS SELECT

`CREATE TABLE AS SELECT` and `CREATE TEMP TABLE` materialize the result of a query into an actual database table. Unlike a CTE, the result persists beyond the query that creates it and can be referenced by subsequent queries in the same session.

Tier 4 · Advanced

16 concepts · 151 problems

The full window-function range, lateral joins, JSON, and recursive queries.

N042

LAG and LEAD

LAG and LEAD are window functions that give each row access to the value of an expression from a different row in the same window partition, without a join or subquery. The offset is positional: a fixed number of rows before (LAG) or after (LEAD) the current row, determined by the partition's ORDER BY.

N043

FIRST_VALUE, LAST_VALUE, NTH_VALUE

FIRST_VALUE, LAST_VALUE, and NTH_VALUE return the value of an expression from a specific position within the window: the first row, the last row, or the nth row. Their result for each input row is a positional lookup: the value of the expression at a designated position within the ordered partition, retrieved and attached to every row in the window.

N044

Window Frames (ROWS, RANGE, GROUPS)

The window frame clause narrows the set of rows a window function operates on for each position in the partition. Where PARTITION BY divides the result set into independent windows and ORDER BY sequences the rows within each window, the frame clause defines a sliding boundary that moves with each row, determining exactly which rows contribute to that row's computation.

N045

NTILE and Percentile Functions

NTILE divides a partition's rows into a specified number of ranked buckets and assigns each row a bucket number. The percentile functions PERCENTILE_CONT and PERCENTILE_DISC compute a value at a specified percentile position within a sorted set of values. These are separate mechanisms with separate purposes, but both answer questions about where a value sits in a distribution.

N046

DISTINCT ON

DISTINCT ON returns one row per distinct value of a specified expression, choosing which row to keep based on ORDER BY. It is PostgreSQL-specific syntax that solves the "one row per group, with full row data" problem that standard SQL requires a subquery or window function to handle.

N047

Correlated Subqueries

A correlated subquery references a column from its outer query, creating a dependency that forces the subquery to be re-evaluated once for each row the outer query processes. The subquery cannot execute independently; its result changes based on which outer row is currently being examined.

N048

LATERAL Joins

LATERAL allows a subquery in the FROM clause to reference columns from tables that appear earlier in the same FROM clause. Without LATERAL, each subquery in FROM is evaluated independently, with no access to the rows being produced by other FROM items. LATERAL lifts that restriction.

N049

FILTER Clause on Aggregates

The FILTER clause attaches a row-level condition directly to an aggregate function, restricting which rows that specific aggregate counts or sums, while leaving other aggregates in the same SELECT list unaffected.

N050

STRING_AGG and ARRAY_AGG

STRING_AGG and ARRAY_AGG are aggregate functions that collect multiple row values into a single output value per group. STRING_AGG concatenates values into a delimited string. ARRAY_AGG collects values into a PostgreSQL array. Both collapse a set of rows into one row, like COUNT or SUM, and their output is the collection of contributing values in a structured form.

N051

generate_series() for Sequences and Date Spines

generate_series() is a PostgreSQL set-returning function that produces a sequence of values from a start point to an end point at a specified step interval. Its primary analytical use is generating a complete, gap-free sequence of dates or integers that can be joined to fact data to ensure every period appears in the result, even periods with no underlying activity.

N052

Recursive CTEs

A recursive CTE defines a query that references itself, allowing it to accumulate rows iteratively until a termination condition is met. Recursive CTEs handle hierarchical and graph-structured data where the depth of traversal is unknown at query-write time and cannot be expressed with a fixed number of joins.

N053

JSONB Field Extraction

The `->` and `->>` operators extract fields and elements from JSONB columns. `->` returns the extracted value as JSONB, preserving its type structure. `->>` returns the extracted value as plain text, discarding all type information.

N054

JSONB Aggregation (jsonb_agg, json_build_object)

`jsonb_agg` and `json_build_object` move in opposite directions across the boundary between relational and JSONB structure. `jsonb_agg` collects relational rows into a JSONB array. `json_build_object` assembles a JSONB object from key-value pairs supplied as arguments. Together they cover the two primary operations analysts need when producing JSONB output from relational data.

N055

Date Spine Construction and Zero-Fill Patterns

A date spine is a complete, gap-free sequence of dates or periods that serves as the structural backbone of a time-series query. Joining a fact table to a date spine with a LEFT JOIN ensures every period in the desired range appears in the result, even periods where no activity occurred in the underlying data.

N056

Period-over-Period Analysis

Period-over-period analysis compares a metric in one time period to the same metric in a prior period of equal length: this week versus last week, this month versus last month, this quarter versus the same quarter last year. The SQL implementation uses LAG to bring a prior period's value into the same row as the current period's value, enabling the comparison as inline arithmetic.

N057

Grouping by Date Periods

Grouping by date periods means truncating timestamps or dates to a calendar unit and using the truncated value as the GROUP BY key. The result is one row per period (one per day, one per week, one per month) with aggregated measures across all rows that fall within each period.

Tier 5 · Expert

9 concepts · 90 problems

Analytical patterns and query reasoning: running totals, sessionization, reading EXPLAIN.

N058

Multi-CTE Query Architecture

Multi-CTE query architecture is the practice of decomposing a complex analytical query into a sequence of named intermediate steps, each expressed as a CTE, where each step does one clearly scoped thing and feeds the next. The architecture imposes deliberate structure on queries that would otherwise become a single deeply nested expression that is correct but unreadable and difficult to debug.

N059

Join Fanout and Aggregate Correctness

Join fanout is a correctness problem that occurs when a join multiplies rows beyond what the query intends, causing subsequent aggregations to compute over inflated row counts and produce wrong results. The query runs without error, returns plausible-looking numbers, and gives no indication anything is wrong.

N060

Reading EXPLAIN Output

`EXPLAIN` shows the execution plan PostgreSQL's query planner chose for a query before running it. Reading that plan tells an analyst where the database will spend its time, how many rows it expects to process at each step, and whether the structure of the query is causing the planner to make expensive choices.

N061

Query Structure Patterns for Performance

Query structure influences how the PostgreSQL planner can execute a query. Two queries that produce identical results can generate very different execution plans depending on how they are written, because the planner's ability to push filters down, choose join strategies, and avoid unnecessary materializations depends on the shape of the SQL it receives.

N062

Choosing Between Subqueries, CTEs, and Joins

Subqueries, CTEs, and joins are three different ways to compose multiple relational operations in a single query. Choosing between them is a judgment about readability, correctness, performance, and reuse, applied to the specific problem at hand.

N063

NULL Propagation in Complex Queries

NULL propagates through a multi-step query in ways that compound across operations. A NULL introduced in one layer of a complex query can travel through subsequent joins, window functions, and aggregations, arriving in the final output at a location that gives no indication of where it originated. The challenge in complex queries is tracing how NULL moves across multiple operations that each handle it differently, from the layer where it was introduced to the output column where it appears.

N064

Running Totals and Cumulative Metrics

A running total accumulates a metric across an ordered sequence of rows, so that each row carries the sum of all preceding rows plus its own value. Cumulative metrics extend this pattern: cumulative revenue, cumulative user signups, cumulative events. Both are produced by combining period-grouped aggregation with a window function that uses an expanding frame over the ordered result.

N065

Sessionization and Funnel Analysis Patterns

Sessionization groups a stream of timestamped events into discrete sessions by detecting gaps between events that exceed a defined timeout. Funnel analysis measures how many users complete each step in a defined sequence, and how many drop off between steps. Both patterns combine LAG, CTEs, and ordered event data into multi-stage query structures that transform raw event logs into interpretable behavioral summaries.

N066

Analyst Debugging Patterns

Analyst debugging is the practice of systematically locating the layer in a complex query where a result diverges from the expected output. The process applies to two distinct classes of problem: wrong results, where the query returns a plausible but incorrect value, and unexpected NULLs or missing rows, where the output is incomplete in ways that are not immediately traceable to the query's structure.

Start practicing

Learn SQL the way it actually works.

Read a concept, run the examples, and practice until the engine confirms you have it. Start with SELECT, no signup required.

Start free