April 14, 2026
Which Rows Survive a SQL Join?
SQL JOINs do more than combine tables. They decide which rows survive, which become NULL, and which disappear entirely. This post explains INNER, LEFT, RIGHT, FULL, CROSS, and SELF JOINs with practical examples and mental models.
A JOIN does not just combine tables. It decides which rows survive.
That is why one keyword can change row counts, introduce NULLs, or silently drop records we expected to keep. If a query returns too few rows, too many rows, or missing dimension values, the join type is usually the first thing to inspect.
Let's use one mental model:
- the
ONclause decides who matches whom - the join type decides what happens to rows that do not match
That second part is where most mistakes happen.
Here is the shortest possible cheat sheet:
INNER JOINkeeps matched rows onlyLEFT JOINkeeps all rows from the left tableRIGHT JOINkeeps all rows from the right tableFULL OUTER JOINkeeps all rows from both sidesCROSS JOINkeeps every possible combinationSELF JOINjoins a table to itself
Example:
Let's start with one small dataset and reuse it for most join types.
customers orders
+----+-------+ +----------+-------------+
| id | name | | order_id | customer_id |
+----+-------+ +----------+-------------+
| 1 | Alice | | 101 | 1 |
| 2 | Bob | | 102 | 3 |
| 3 | Cara | | 103 | 99 |
+----+-------+ +----------+-------------+
Two pairs matter here:
- Alice matches order
101 - Cara matches order
102
Two rows do not match:
- Bob has no order
- order
103points tocustomer_id = 99, which does not exist incustomers
In a strictly constrained transactional database, that orphaned key might be impossible. In analytics pipelines, it is common enough. Late-arriving dimensions, weak source constraints, or partially loaded data can all create this shape.
That is enough to explain every join type.
INNER JOIN
INNER JOIN keeps only rows that match on both sides.
SELECT c.name, o.order_id
FROM customers c
INNER JOIN orders o
ON c.id = o.customer_id;
Result:
+-------+----------+
| name | order_id |
+-------+----------+
| Alice | 101 |
| Cara | 102 |
+-------+----------+
Bob disappears because there is no matching order. Order 103 disappears because there is no matching customer.
This is the most restrictive join. It is correct when we only care about rows that fully match.
LEFT JOIN
LEFT JOIN keeps every row from the left table and fills the right side with NULL when there is no match.
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id;
Result:
+-------+----------+
| name | order_id |
+-------+----------+
| Alice | 101 |
| Bob | NULL |
| Cara | 102 |
+-------+----------+
Bob survives because customers is on the left side.
This is the join we use when the left table is the grain we want to preserve.
RIGHT JOIN
RIGHT JOIN does the same thing in the other direction. It keeps every row from the right table.
SELECT o.order_id, c.name
FROM customers c
RIGHT JOIN orders o
ON c.id = o.customer_id;
Result:
+----------+-------+
| order_id | name |
+----------+-------+
| 101 | Alice |
| 102 | Cara |
| 103 | NULL |
+----------+-------+
Order 103 survives because orders is on the right side.
Technically this is fine. In practice, many teams prefer to swap table order and write LEFT JOIN because it is easier to read in one consistent direction.
FULL OUTER JOIN
FULL OUTER JOIN keeps all rows from both sides. Matched rows are combined. Unmatched rows survive with NULL on the missing side.
SELECT c.name, o.order_id
FROM customers c
FULL OUTER JOIN orders o
ON c.id = o.customer_id;
Result:
+-------+----------+
| name | order_id |
+-------+----------+
| Alice | 101 |
| Bob | NULL |
| Cara | 102 |
| NULL | 103 |
+-------+----------+
Tip: FULL JOIN and FULL OUTER JOIN mean the same thing.
One production note matters here: not every SQL engine supports FULL OUTER JOIN directly. For example, MySQL does not. In those engines, we usually emulate it with a LEFT JOIN, a RIGHT JOIN, and a UNION.
CROSS JOIN
CROSS JOIN does not use matching keys at all. It returns the Cartesian product.
That means every row on the left is paired with every row on the right.
SELECT p.product_name, r.region
FROM products p
CROSS JOIN regions r;
If products has 5 rows and regions has 10 rows, the result has 5 x 10 = 50 rows.
This is useful when we intentionally want all combinations, such as:
- product x region matrices
- date x category grids
- test datasets
- scaffold tables for missing combinations
This join is powerful, but it is also the fastest way to create accidental row explosion.
SELF JOIN
A SELF JOIN joins a table to itself. We use it when rows inside the same table relate to other rows in that same table.
A common example is employee to manager.
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.id;
Result idea:
+----------+---------+
| employee | manager |
+----------+---------+
| Ana | Sara |
| Leo | Sara |
| Sara | NULL |
+----------+---------+
Sara has no manager, so LEFT JOIN preserves her row and fills the manager column with NULL.
This is why SELF JOIN is common in hierarchies, org charts, and parent-child relationships.
The join type is not enough
Even when the join type is correct, post-join filtering can still change the result.
The classic trap is LEFT JOIN followed by a WHERE filter on the right table.
-- Wrong if we want to keep all customers
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
WHERE o.order_id IS NOT NULL;
That WHERE clause removes the NULL rows created by the LEFT JOIN, which makes the result behave like an INNER JOIN.
If we need to preserve left-side rows, filters on right-table columns usually belong in ON, not in WHERE.
- put the filter in
ONwhen we want to keep unmatched left rows and only limit which right-side rows can match - put the filter in
WHEREwhen we want to filter the final joined result, even if that removes unmatched rows
Multiple joins
Now let's move from one join to a more realistic warehouse pattern.
When we write multiple joins, the useful mental model is this: SQL builds a working result, then joins the next table to that result, then joins the next table again.
Logically, that means each join produces an intermediate result set that becomes the input to the following join.
sales
-> join products
-> intermediate result
-> join customers
-> final result
That is the right model for reasoning about row counts and NULL propagation.
One precision note matters: the optimizer is free to change the physical execution plan. It might reorder joins or choose a different strategy internally. But for understanding what the query means, the intermediate-result model is the right place to start.
If sales is our fact table and products and customers are dimensions, we often want to preserve every sales row even when a dimension lookup is missing.
SELECT
s.sales_id,
s.product_id,
s.customer_id,
s.quantity,
p.product_name,
c.customer_name
FROM sales s
LEFT JOIN products p
ON s.product_id = p.product_id
LEFT JOIN customers c
ON s.customer_id = c.customer_id;
Both joins are LEFT JOINs because sales is the grain we want to keep.
That is the real question behind multiple joins: which table defines the grain, and which rows are non-negotiable?
WHERE vs HAVING after a join
Once a join produces rows, filtering still follows normal SQL rules.
WHEREfilters rows before groupingHAVINGfilters grouped results after aggregation
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
WHERE c.id > 0
GROUP BY c.name
HAVING COUNT(o.order_id) > 1;
If we put COUNT(o.order_id) > 1 in WHERE, the query fails because aggregates are not available yet.
Using a CTE before a join
Sometimes the best join is not a direct join from raw tables. Sometimes we first need to shape the data.
A CTE, short for Common Table Expression, is a temporary named result set defined at the start of a query with WITH.
It is not a permanent table. It only exists for the statement that follows it.
We use CTEs when we want to:
- break a query into readable steps
- name an intermediate result
- reuse the same logic inside a larger query
- separate aggregation logic from lookup logic
A common pattern is to aggregate first, then join the aggregated result.
WITH product_sales AS (
SELECT
product_id,
SUM(quantity) AS total_quantity
FROM sales
GROUP BY product_id
HAVING SUM(quantity) > 50
)
SELECT
p.product_name,
ps.total_quantity
FROM product_sales ps
JOIN products p
ON ps.product_id = p.product_id;
This is where a CTE helps.
Here, product_sales is the named intermediate result. First we compute total quantity per product. Then we join that smaller shaped result to products.
That makes the query easier to read, easier to reuse, and easier to test mentally. We separate the aggregation step from the lookup step instead of burying everything inside one large statement.
Takeaway:
A JOIN is a row-retention decision, not just a table-combination feature.
If we remember only one rule, let it be this:
ONdecides who matches- join type decides who survives
That one mental model explains most join behavior.
Keep this map in mind:
INNER JOINkeeps matched rows onlyLEFT JOINkeeps the left sideRIGHT JOINkeeps the right sideFULL OUTER JOINkeeps both sidesCROSS JOINmultiplies rowsSELF JOINcompares rows inside one table
And keep these two warnings next to it:
- a
WHEREclause after aLEFT JOINcan quietly remove rows we meant to keep - joins should preserve the grain we care about, especially in fact-to-dimension models
Challenge:
Take the sales, products, and customers tables and write three versions of the same query:
- an
INNER JOINversion - a
LEFT JOINversion that preserves all sales rows - a
CTEversion that first aggregates sales by product, then joins toproducts
Compare the row counts and explain which rows disappear in each version. That comparison is where join types stop feeling abstract and start feeling operational.