Sibling Categories — Self-Joins in SQL

The problem

Brightlane's category team wants to identify all pairs of sibling categories — categories that share the same parent.

Write a query to return the name of each category in every sibling pair.

Assumptions:

The categories table contains every defined category in the catalogue.
Siblings share the same parent_id. A category is not its own sibling, so the result must exclude pairs where both rows refer to the same category.
Each pair appears twice in the result — once with category A on the left and B on the right, once with B on the left and A on the right.

Output:

One row per ordered sibling pair, with columns category_name and sibling_name.

Schema · ecommerce 5 tables

The shape

Two categories are siblings when they share a parent_id, so the join condition pairs rows on that shared attribute (c1.parent_id = c2.parent_id) rather than on the usual primary-key link. The inequality c1.id <> c2.id then strips out the self-pairs, where the same category matches itself.

Clause by clause

SELECT c1.name AS category_name, c2.name AS sibling_name returns one name from each aliased instance. Each output row is an ordered pair of sibling names: Phones / Tablets, Phones / Laptops, and so on.
FROM categories c1 reads categories as the left side of the pair.
JOIN categories c2 ON c1.parent_id = c2.parent_id joins on the shared parent_id. Every Electronics child matches every Electronics child, including itself; every Clothing child matches every Clothing child, including itself.
WHERE c1.id <> c2.id drops the diagonal — the rows where a category matched itself. Phones-and-Phones is gone, but Phones-and-Tablets and Tablets-and-Phones both remain, which is why each pair appears twice as ordered pairs in the result.

Why this and not a join on `id`

Most self-joins so far have used the primary-key form: e.manager_id = m.id or c.parent_id = p.id. That form expresses a directed relationship — one row points to another by storing its ID. Siblings are not a directed relationship. Neither sibling stores a pointer to the other; they only share a third value (parent_id). The join condition has to match on that shared value directly. The general rule: when the relationship is "share an attribute" rather than "reference each other," the join condition is c1.col = c2.col on the shared attribute, not a foreign-key-to-primary-key link.

The trap

Without WHERE c1.id <> c2.id, every category pairs with itself in the result — Phones-and-Phones, Tablets-and-Tablets, every row. The join condition c1.parent_id = c2.parent_id is reflexive: any row's parent_id equals its own parent_id, so each row matches itself before it matches any genuine sibling. The fix is the explicit inequality on the primary key, which excludes the diagonal of self-matches without disturbing the cross-pairs. Anytime a self-join's ON condition would be true when both aliases land on the same row, the WHERE clause needs an inequality to drop those self-pairs.

You practiced a self-join where the join condition is on a non-primary-key column shared by both aliases (parent_id), with an inequality on id to exclude self-pairs. The recurring shape any time "things that share an attribute but aren't the same thing" is the question.

Return the name of each category in every sibling pair

The shape

Clause by clause

Why this and not a join on `id`

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

The shape

Clause by clause

Why this and not a join on id

The trap

Reading explains SQL. Writing it, over and over with instant feedback, is what makes you fluent.

Why this and not a join on `id`