SQL NULL Handling: COALESCE, NULLIF, and CASE

NULL is not zero. NULL is not an empty string. NULL means the value is unknown or missing, and SQL treats it differently from every other value in the database.

Because NULL represents the absence of a known value, any comparison or arithmetic involving NULL produces another unknown result. This behavior surprises most developers the first time they encounter it, and it causes incorrect query results that are hard to trace because no error is thrown.

Let's use one mental model.

Imagine a sealed box with a question mark painted on it. You do not know what is inside. If someone asks "is the box heavier than five kilograms?", the correct answer is not yes or no. It is "I don't know." SQL applies that same reasoning to every operation involving NULL. Asking whether amount = NULL does not produce TRUE or FALSE. It produces an unknown result the database treats as neither.

amount = 100     100 = 100      TRUE
amount = NULL    NULL = 100     NULL  (unknown, not FALSE)
amount = NULL    NULL = NULL    NULL  (unknown, not TRUE)
amount IS NULL   special test   TRUE  (the correct way to test)

That asymmetry is why WHERE amount = NULL never matches any row, even rows where amount is NULL. The condition evaluates to NULL for every row, and NULL is not a confirmed TRUE.

NULL handling matters in data engineering for:

filtering records where optional fields were not populated
replacing missing values with defaults before aggregation or reporting
avoiding arithmetic errors when a column contains NULL unexpectedly
classifying rows cleanly without letting NULL fall into the wrong bucket
validating data completeness between source systems and targets

Example:

Let's build one customers table with intentional NULLs and walk through every technique step by step.

customers
+-------------+------------+-------------+--------+--------------+--------+
| customer_id | first_name | middle_name | city   | billing_city | amount |
+-------------+------------+-------------+--------+--------------+--------+
| 1           | Alice      | NULL        | Paris  | Lyon         | 150    |
| 2           | Bob        | James       | NULL   | London       | NULL   |
| 3           | Carol      | NULL        | Berlin | Munich       | 80     |
| 4           | David      | Lee         | NULL   | Madrid       | 500    |
| 5           | Eve        | NULL        | Tokyo  | Osaka        | 0      |
+-------------+------------+-------------+--------+--------------+--------+

Three customers have no middle_name. Two have no city. One has no amount. One has amount = 0, which is a real value, not missing data. That distinction between zero and NULL matters for several techniques below.

Testing for NULL: IS NULL and IS NOT NULL

The only correct way to test whether a value is NULL is with IS NULL or IS NOT NULL.

This query returns no rows, even though three rows have middle_name = NULL:

SELECT customer_id, middle_name
FROM customers
WHERE middle_name = NULL;

NULL = NULL evaluates to NULL, not TRUE. The WHERE clause never matches. The database treats unknown as "not a confirmed match" and excludes every row.

The correct query:

SELECT customer_id, middle_name
FROM customers
WHERE middle_name IS NULL;

Result:

+-------------+-------------+
| customer_id | middle_name |
+-------------+-------------+
| 1           | NULL        |
| 3           | NULL        |
| 5           | NULL        |
+-------------+-------------+

Flip the filter with IS NOT NULL to return only rows where the field is populated:

SELECT customer_id, middle_name
FROM customers
WHERE middle_name IS NOT NULL;

Result:

+-------------+-------------+
| customer_id | middle_name |
+-------------+-------------+
| 2           | James       |
| 4           | Lee         |
+-------------+-------------+

IS NULL and IS NOT NULL are the correct operators for any filter, validation, or data quality check that involves missing values. Once we can detect NULL, the next step is replacing it.

Replacing NULL with COALESCE

COALESCE(expr1, expr2, ...) returns the first non-NULL value in its argument list. If every argument is NULL, it returns NULL.

Use COALESCE when:

displaying data where NULL should show as a readable placeholder like 'N/A' or 'Unknown'
feeding a column into arithmetic that would otherwise produce NULL
providing a safe default value for optional fields before aggregation

This query replaces NULL middle_name values with the string 'N/A':

SELECT
    customer_id,
    first_name,
    COALESCE(middle_name, 'N/A') AS middle_name,
    amount
FROM customers;

Result:

+-------------+------------+-------------+--------+
| customer_id | first_name | middle_name | amount |
+-------------+------------+-------------+--------+
| 1           | Alice      | N/A         | 150    |
| 2           | Bob        | James       | NULL   |
| 3           | Carol      | N/A         | 80     |
| 4           | David      | Lee         | 500    |
| 5           | Eve        | N/A         | 0      |
+-------------+------------+-------------+--------+

Customers 1, 3, and 5 had NULL replaced with 'N/A'. Notice that amount for customer 2 is still NULL in this result. COALESCE was only applied to middle_name — each column must be handled independently. The NULL in amount is untouched until we address it in a later expression.

COALESCE evaluates its arguments from left to right and stops at the first non-NULL value. That makes it useful for cascading fallbacks across multiple columns.

COALESCE with multiple fallbacks

This query resolves the best available city for each customer, checking city first and falling back to billing_city:

SELECT
    customer_id,
    COALESCE(city, billing_city, 'Unknown') AS final_city
FROM customers;

Result:

+-------------+------------+
| customer_id | final_city |
+-------------+------------+
| 1           | Paris      |
| 2           | London     |
| 3           | Berlin     |
| 4           | Madrid     |
| 5           | Tokyo      |
+-------------+------------+

Customers 2 and 4 had no city, so COALESCE moved to the second argument, billing_city, and returned London and Madrid. All other customers had a primary city, so COALESCE returned it immediately. The 'Unknown' third argument never fired here, but it provides a safe floor if both columns are NULL.

This cascading pattern appears constantly in data engineering when resolving the best available value for a field that is populated inconsistently across source systems.

COALESCE replaces NULL with a known value. Sometimes we need the inverse: converting a known value into NULL to signal that it should be treated as missing.

NULLIF: converting a value to NULL

NULLIF(a, b) returns NULL if a equals b, otherwise it returns a. It converts a known placeholder value back into NULL so downstream logic treats it correctly.

Use NULLIF when:

a value like 0, -1, or 'N/A' was loaded as a placeholder where NULL should have been used
a denominator in division might be zero and NULL is preferable to a division-by-zero error
normalizing data that uses sentinel values instead of proper NULL representation

The division-by-zero protection pattern:

-- Instead of: total_sales / total_orders  (crashes when total_orders = 0)
-- Write:       total_sales / NULLIF(total_orders, 0)
-- If total_orders is 0, the denominator becomes NULL and the result is NULL instead of an error

Applied to our dataset to convert zero amounts into NULL:

SELECT
    customer_id,
    NULLIF(amount, 0) AS normalized_amount
FROM customers;

Result:

+-------------+-------------------+
| customer_id | normalized_amount |
+-------------+-------------------+
| 1           | 150               |
| 2           | NULL              |
| 3           | 80                |
| 4           | 500               |
| 5           | NULL              |
+-------------+-------------------+

Customer 2 was already NULL and stays NULL. Customer 5 had amount = 0, which NULLIF converted to NULL. The distinction between "no data" (NULL) and "zero" (0) is meaningful in most business contexts. NULLIF lets us collapse ambiguous zeros into the same bucket as genuinely missing values when the use case requires it.

With individual value handling covered, the next tool is classification: deciding which label or bucket a row belongs to, including rows where the column might be NULL.

Conditional logic with CASE

CASE evaluates a list of conditions and returns a value for the first one that matches. If no condition matches and there is no ELSE, the result is NULL.

Use CASE when:

assigning a category or label to rows based on column values
computing a derived field with different logic for different input ranges
bucketing a continuous value into named groups

This query categorizes customers by amount:

SELECT
    customer_id,
    amount,
    CASE
        WHEN amount < 50  THEN 'Small'
        WHEN amount < 200 THEN 'Medium'
        ELSE 'Large'
    END AS size_category
FROM customers;

Result:

+-------------+--------+---------------+
| customer_id | amount | size_category |
+-------------+--------+---------------+
| 1           | 150    | Medium        |
| 2           | NULL   | Large         |
| 3           | 80     | Medium        |
| 4           | 500    | Large         |
| 5           | 0      | Small         |
+-------------+--------+---------------+

Customer 2 has amount = NULL. The conditions NULL < 50 and NULL < 200 both evaluate to NULL, which is not TRUE, so neither WHEN clause fires. The ELSE catches it and returns 'Large'. That is wrong: a customer with no amount data is not a large spender. NULL silently fell into the wrong bucket with no warning.

The fix requires one explicit check.

NULL inside CASE

Adding WHEN amount IS NULL as the first condition prevents NULL from reaching the numeric comparisons:

SELECT
    customer_id,
    amount,
    CASE
        WHEN amount IS NULL THEN 'No Data'
        WHEN amount < 50   THEN 'Small'
        WHEN amount < 200  THEN 'Medium'
        ELSE 'Large'
    END AS size_category
FROM customers;

Result:

+-------------+--------+---------------+
| customer_id | amount | size_category |
+-------------+--------+---------------+
| 1           | 150    | Medium        |
| 2           | NULL   | No Data       |
| 3           | 80     | Medium        |
| 4           | 500    | Large         |
| 5           | 0      | Small         |
+-------------+--------+---------------+

Customer 2 correctly shows 'No Data'. The rule: whenever a column used inside CASE might be NULL, add an explicit WHEN column IS NULL as the first WHEN. Do not rely on ELSE to handle it cleanly, because ELSE is designed for values that do not match, not for absent values.

CASE handles conditional logic. COALESCE handles fallbacks. Used together they cover a pattern that appears in nearly every reporting pipeline.

Combining COALESCE and CASE

A CASE expression can return NULL intentionally when a row does not match any meaningful condition. COALESCE can wrap that CASE to provide a final default:

SELECT
    customer_id,
    COALESCE(
        CASE
            WHEN amount IS NULL THEN NULL
            WHEN amount = 0     THEN 'No Purchases'
            WHEN amount < 100   THEN 'Low Spender'
            WHEN amount < 500   THEN 'Medium Spender'
            ELSE                     'High Spender'
        END,
        'No Data'
    ) AS customer_tier
FROM customers;

Result:

+-------------+----------------+
| customer_id | customer_tier  |
+-------------+----------------+
| 1           | Medium Spender |
| 2           | No Data        |
| 3           | Low Spender    |
| 4           | High Spender   |
| 5           | No Purchases   |
+-------------+----------------+

The WHEN amount IS NULL THEN NULL line is optional here — if removed, NULL would fall through to the ELSE and produce NULL anyway, which COALESCE would still catch. Writing it explicitly documents the intent and prevents accidental matches if the CASE is edited later.

The CASE expression explicitly returns NULL for rows where amount IS NULL. COALESCE catches that NULL and replaces it with 'No Data'. Customer 5 has amount = 0 and receives 'No Purchases' instead of landing in any arithmetic bucket.

This pattern is common in pipelines where a coded column must be translated to a readable label and any unknown or missing code must produce a safe output rather than a NULL or a misleading default.

COALESCE with aggregations

Most aggregate functions, including SUM, AVG, MIN, and MAX, ignore NULL values. When all rows in a group have NULL, the aggregate returns NULL rather than zero. COALESCE handles this at the output level:

SELECT
    COALESCE(SUM(amount), 0) AS total_amount,
    COALESCE(AVG(amount), 0) AS avg_amount
FROM customers
WHERE customer_id = 2;

Result:

+--------------+------------+
| total_amount | avg_amount |
+--------------+------------+
| 0            | 0          |
+--------------+------------+

Customer 2 has amount = NULL. SUM and AVG over that single NULL row both return NULL. COALESCE replaces them with 0, which is the correct representation for downstream consumers that expect a numeric output.

A related pattern combines COALESCE with window functions. If a time-series table has missing values in a column, we can fill each gap with the most recent known value using LAG:

SELECT
    customer_id,
    amount,
    COALESCE(
        amount,
        LAG(amount) OVER (ORDER BY customer_id)
    ) AS amount_filled
FROM customers;

Result:

+-------------+--------+---------------+
| customer_id | amount | amount_filled |
+-------------+--------+---------------+
| 1           | 150    | 150           |
| 2           | NULL   | 150           |
| 3           | 80     | 80            |
| 4           | 500    | 500           |
| 5           | 0      | 0             |
+-------------+--------+---------------+

Customer 2 has amount = NULL. LAG(amount) looks at the previous row, which is customer 1 with amount = 150. COALESCE picks the first non-NULL value: 150 from the LAG result. Customer 5 has amount = 0, which is a real value. COALESCE returns 0 directly without checking the LAG result, because 0 is non-NULL.

This forward-fill pattern is common in event logs and time-series pipelines where a missing value should inherit the last known measurement rather than being left as a gap.

With the full toolkit covered, the key is knowing which tool fits each situation.

Takeaway:

NULL is an unknown value, not a zero or an empty string. SQL treats NULL differently from all other values: comparisons with NULL return NULL, arithmetic with NULL returns NULL, and only IS NULL and IS NOT NULL can reliably detect it.

The toolkit for handling NULL:

IS NULL / IS NOT NULL: the only correct operators for testing whether a value is NULL
COALESCE(expr1, expr2, ...): returns the first non-NULL value in the list; use for defaults and fallbacks
NULLIF(a, b): returns NULL if a = b; use to convert sentinel values or prevent division by zero
CASE: classify or label rows conditionally; always add an explicit WHEN col IS NULL before numeric comparisons when the column might be NULL

Two rules to keep close:

never use = NULL in a WHERE clause; use IS NULL instead
in a CASE expression, check for NULL first before any numeric or string condition, or ELSE will silently absorb it

Challenge:

Take a transactions table with columns transaction_id, customer_id, amount, discount, and processed_at. Some amount values are NULL. Some discount values are 0. Some processed_at values are NULL.

Write a query that displays amount with NULL replaced by 0 using COALESCE, and discount with zero values converted to NULL using NULLIF.
Write a CASE expression that labels each transaction: 'Pending' if processed_at IS NULL, 'Processed' otherwise.
Write a query that computes net_amount = COALESCE(amount, 0) - COALESCE(discount, 0) and classifies it using CASE: 'Free' if net_amount = 0, 'Low' if under 50, 'High' otherwise.
Combine all three into a single SELECT that returns transaction_id, amount_clean, discount_clean, status, and tier.

For each expression, write one sentence explaining why you chose COALESCE, NULLIF, or CASE for that specific problem rather than one of the other two tools.