Two Word Names via Regex — Pattern Matching in SQL

The problem

Brightlane's deduplication process identifies customers with simple two-word names in the form 'Firstname Lastname' — names that fit that exact shape get processed by a fast path.

Write a query to return the ID and name of every customer whose name fits the simple two-word form.

Assumptions:

The customers table has one row per customer with an id and a name.
A qualifying name, after being lowercased, consists of exactly two sequences of letters separated by a single space and contains no other characters: no digits, no punctuation, no hyphens, no extra words, no leading or trailing spaces.

Output:

One row per qualifying customer, with columns id and name.

Schema · ecommerce 5 tables

The shape

The pattern ^[a-z]+ [a-z]+$ describes the entire string from start to end: one or more lowercase letters, a single space, one or more lowercase letters, and nothing else. LOWER(name) normalizes the input before the regex sees it, so the letter class [a-z] can be lowercase-only and still accept names that were stored with mixed case. Anchors and normalization together enforce the exact shape.

Clause by clause

SELECT id, name returns the customer ID and the original-case name. The lowering is only for the comparison; the output keeps the recorded capitalization.
FROM customers reads the customer table.
WHERE LOWER(name) ~ '^[a-z]+ [a-z]+$' is the filter that does all the work. LOWER(name) produces a lowercased copy of the name for the match; ~ is the case-sensitive POSIX regex operator (case sensitivity does not matter on a lowercased input, but ~ is the right operator when the input is already normalized). The ^ anchors at the start, [a-z]+ matches one or more lowercase letters, then a literal space, then [a-z]+ again, then $ anchors at the end. Any name with a digit, hyphen, apostrophe, extra word, or stray whitespace fails the match because the anchors leave no room for additional characters.

Why anchor both ends and not just the start

Without the $, the pattern ^[a-z]+ [a-z]+ would match alice nguyen and also alice nguyen-smith (it would match the alice nguyen prefix and ignore the rest). Without the ^, the pattern [a-z]+ [a-z]+$ would match any name ending in two words, even if it had more before them. The ^ and $ together force the regex to account for every character in the string. Names like James O'Brien, Xander Wright-Adams, or Mary Jane Watson all fail because the apostrophe, hyphen, or third word has nowhere to go inside the pattern.

The trap

The LOWER call is doing two jobs at once that are easy to conflate. It lets the letter class stay narrow ([a-z] instead of [A-Za-z]), and it lets the comparison treat ALICE NGUYEN the same as Alice Nguyen. Without the LOWER call, a name stored in all caps would fail [a-z]+ [a-z]+ even though it fits the two-word shape. The alternative spelling name ~* '^[a-z]+ [a-z]+$' would also work, because ~* folds the case on both sides of the match. Both are valid; the version in the canonical query separates "normalize the input" from "test the shape," which is the more readable pattern on a real filter.

You practiced anchoring a regex with ^...$ over a LOWER-normalized input — anchor both ends to enforce the entire string matches the shape, normalize first so capitalization variations don't trip the letter class.

Return the ID and name of every customer whose name fits the simple two-word form

The shape

Clause by clause

Why anchor both ends and not just the start

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.