SQL Logical Execution Order: The Hidden Processing Path

SQL does not execute clauses in the order we type them.

Instead, the engine builds the result in stages. Once we see that sequence, several confusing rules stop feeling arbitrary:

why aliases are not available in WHERE
why aggregates cannot appear in WHERE
why HAVING exists
why WHERE and HAVING change results differently
why SELECT is evaluated late

Understanding this is one of the fastest ways to debug filtering and grouping mistakes.

Let's use one mental model: SQL is an assembly line.

Each clause receives input from the previous stage. A clause cannot use data that has not been produced yet.

Here is the logical order:

FROM - load base table(s)
JOIN - match related rows inside the FROM phase
WHERE - filter rows before grouping
GROUP BY - form groups
HAVING - filter groups after aggregation
SELECT - shape the output
DISTINCT - remove duplicates
ORDER BY - sort rows
LIMIT - keep the top N

Three consequences matter immediately:

WHERE cannot use SELECT aliases because aliases are created at stage 6
WHERE cannot use aggregates because grouping happens at stage 4
HAVING can use aggregates because it runs after grouping

Example:

Let's take one complete query and walk through it in logical order.

SELECT
    customers.name AS customer_name,
    COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
WHERE orders.status = 'shipped'
GROUP BY customers.name
HAVING COUNT(*) > 5
ORDER BY order_count DESC
LIMIT 10;

This query answers a simple question: which customers have more than 5 shipped orders?

We will read it in execution order, not in writing order.

Stage 1 and 2: FROM and JOIN

For teaching, we split this into two steps. In actual logical processing, the join is resolved inside the FROM phase.

The engine starts here:

FROM customers loads the base table
JOIN orders ON ... combines matching rows

By the end of the FROM phase, including join matching, SQL has a working table. Nothing has been filtered or grouped yet.

Stage 3: WHERE

WHERE filters individual rows before any grouping happens.

In our example, WHERE orders.status = 'shipped' keeps only shipped orders.

That is why it cannot reference:

aliases from SELECT
aggregates such as COUNT, SUM, or AVG

-- Fails: alias does not exist yet
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE order_count > 5
GROUP BY customer_id;

-- Fails: aggregate not available yet
SELECT customer_id, COUNT(*) as total
FROM orders
WHERE COUNT(*) > 5
GROUP BY customer_id;

Stage 4: GROUP BY

GROUP BY collapses rows into groups. After that, each group becomes one output row.

In our example, GROUP BY customers.name creates one group per customer.

From this point on, the SELECT list can contain only:

grouped columns
aggregate expressions

-- Valid
SELECT customers.name, COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.name;

-- Invalid: order_date is neither grouped nor aggregated
SELECT customers.name, orders.order_date, COUNT(*)
FROM customers
JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.name;

Stage 5: HAVING

HAVING filters groups after aggregation. That is why aggregate conditions belong here.

In our example, HAVING COUNT(*) > 5 keeps only customers whose shipped-order count is greater than 5.

-- Correct
SELECT customers.name, COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.name
HAVING COUNT(*) > 5;

-- Wrong
SELECT customers.name, COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
WHERE COUNT(*) > 5
GROUP BY customers.name;

Keep the split simple:

WHERE filters rows
HAVING filters groups

Stage 6: SELECT

This is where SQL shapes the final output and creates aliases such as order_count.

So if WHERE runs first and SELECT runs later, WHERE cannot see order_count.

ORDER BY can see that alias because it runs later. In practice, all major SQL engines support this.

SELECT customers.name AS customer_name, COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.name
ORDER BY order_count DESC;

Stages 7 to 9: DISTINCT, ORDER BY, LIMIT

These final stages are straightforward.

In our example:

DISTINCT removes duplicates from the selected result
ORDER BY sorts the result
LIMIT keeps only the requested number of rows

We do not use DISTINCT here because GROUP BY customers.name already produces one row per customer.

SELECT customers.name AS customer_name, COUNT(*) AS order_count
FROM customers
JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.name
ORDER BY order_count DESC
LIMIT 10;

Let's pressure-test this mental model with three common mistakes.

Mistake 1: Using a SELECT Alias in WHERE

-- Fails
SELECT
    product_name,
    price * quantity as revenue
FROM sales
WHERE revenue > 1000;

-- Correct
SELECT
    product_name,
    price * quantity as revenue
FROM sales
WHERE price * quantity > 1000;

WHERE runs before SELECT, so the alias does not exist yet.

Mistake 2: Using an Aggregate in WHERE

-- Fails
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE COUNT(*) > 5
GROUP BY customer_id;

-- Correct
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

COUNT(*) becomes available only after grouping, so the filter belongs in HAVING.

Mistake 3: Breaking a LEFT JOIN with WHERE

-- Wrong: this behaves like an INNER JOIN
SELECT
    customers.name,
    orders.total
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id
WHERE orders.status = 'shipped';

-- Correct: keep the filter in the join condition
SELECT
    customers.name,
    orders.total
FROM customers
LEFT JOIN orders
    ON customers.id = orders.customer_id
    AND orders.status = 'shipped';

LEFT JOIN preserves unmatched rows first. Then WHERE runs and removes rows where orders.status is NULL. That is how a LEFT JOIN quietly turns into an INNER JOIN.

If we want to preserve unmatched rows, the filter belongs in ON.

Now let's turn the idea into practice.

We have three tables:

customers (id, name, region)
orders (id, customer_id, order_date, status)
order_items (order_id, product_id, quantity, price)

Write a query that returns:

customers from the North region
more than 3 completed orders
total revenue above $5000
customer name, order count, and total revenue
the top 10 rows by total revenue

Before writing the query, answer these four questions:

Which filters belong in WHERE?
Which filter belongs in HAVING?
Can total_revenue be reused as an alias in WHERE or HAVING?
Why must customer_name appear in GROUP BY if it appears in SELECT?

Bonus challenge: instead of the ranked filter query above, write a second query that returns all North-region customers, including those with zero completed orders, with order_count and total_revenue shown as 0 when no matching order exists. That change depends on join type, null handling, and filter placement.

Takeaway:

Keep this pipeline in mind every time a query behaves strangely:

FROM -> load base table(s)
JOIN -> match related rows inside the FROM phase
WHERE -> filter rows
GROUP BY -> form groups
HAVING -> filter groups
SELECT -> shape the output
DISTINCT -> remove duplicates
ORDER BY -> sort rows
LIMIT -> keep the top N

That is the core idea. SQL is declarative, but its logic is not random. Once we understand the execution order, clause placement becomes much easier to reason about. Next time a query result, metric, or dashboard looks wrong, check the order of the query before blaming the data.