Update NULL Values in SQL: 5 Patterns + AI Shortcut (2026)
An r/SQL user asked "Best way to update NULL values?" yesterday. Here are five production-grade patterns — plain UPDATE, COALESCE, CASE, JOIN-based, and batched — with real benchmarks, dialect notes, and the AI shortcut for when you don't want to remember which one is which.
Why This Question Keeps Coming Up
An r/SQL user asked yesterday: "Best way to update NULL values?" The thread filled up fast, which tells you everything about why this question matters — there are at least five reasonable answers, and the right one depends on table size, dialect, and what NULL actually represents in your schema. Pick the wrong pattern and you'll either lock a production table for 40 minutes, silently miss rows because of NULL's three-valued logic, or generate a transaction log spike that wakes up your DBA.
This guide walks through the five patterns that cover roughly 95% of real-world cases, with SQL you can paste into MySQL, PostgreSQL, or SQL Server. Then it shows the AI shortcut for when you'd rather describe the intent in English and let the dialect-aware generator pick the pattern. If you want to skip the explanation and just try the AI version, start a free AI2SQL trial — 7 days, card required, generates dialect-correct UPDATEs in MySQL, PostgreSQL, SQL Server, BigQuery, and Snowflake.
What "Best Way" Even Means
"Best" isn't a single dimension — it's four, and they trade off against each other. Before picking a pattern, decide which one you can't compromise on:
- Correctness. Does the query touch exactly the rows you mean and leave the rest alone? NULL's three-valued logic (
NULL = NULLis not true, it'sUNKNOWN) bites here —WHERE col = NULLmatches zero rows on every dialect. Always useIS NULL. - Performance. How long does the UPDATE run, and what does it lock? A naive single statement on 10M rows holds a table-level exclusive lock the entire time on PostgreSQL or SQL Server. Batching costs more wall-clock for small tables but unlocks concurrent writes on big ones.
- Readability. Will the on-call engineer at 2am understand the query?
COALESCEis shorter thanCASE WHEN col IS NULLbut only when there's a single fallback per column. - Maintainability. Does the pattern survive schema changes? If a NOT NULL constraint gets added next quarter, do you need to rewrite the UPDATE? JOIN-based patterns sourcing defaults from a config table tend to survive better than hard-coded literals.
The five patterns below each optimize for a different mix of these four. Pick by intent, not by what looks cleanest.
Pattern 1: Plain UPDATE with WHERE IS NULL
The baseline. Works on every dialect, communicates intent clearly, and is the right answer when the table is small or the maintenance window is generous.
-- Set NULL phone numbers to 'unknown'
UPDATE customers
SET phone = 'unknown'
WHERE phone IS NULL;
When to use: tables under ~500K rows, off-peak windows, single-column updates where the fallback is a literal. Also the go-to in migration scripts that run once during a deploy.
Caveats. On large tables this takes an exclusive lock on every row it touches. On PostgreSQL this blocks writers to those rows; on SQL Server (without RCSI enabled) it blocks readers too. The transaction log grows by the size of the changed rows multiplied by row width — a 10M-row UPDATE on a wide table can easily produce 5–10 GB of log. And remember: WHERE phone = NULL matches zero rows. Always IS NULL.
Pattern 2: COALESCE in UPDATE
COALESCE returns the first non-NULL argument. Inside an UPDATE it lets you replace NULLs without a WHERE clause — but that's also its trap.
-- Fill missing phone numbers with a derived default
UPDATE customers
SET phone = COALESCE(phone, country_phone_prefix, 'unknown');
When COALESCE wins: when the fallback is a chain (try this column, then that column, then a literal) rather than a single value. It's also useful when you want a single SET clause to handle multiple possible NULL sources without nested CASE statements.
The trap. Without a WHERE phone IS NULL clause, this statement rewrites every row. Same column, same value for non-NULL rows — but every row gets a new MVCC version on PostgreSQL, every row touches the heap on MySQL InnoDB, every row writes to the transaction log on SQL Server. On a 10M-row table that's a 10M-row UPDATE pretending to be a fix for the 200K rows that actually had NULLs. Always add the predicate:
UPDATE customers
SET phone = COALESCE(country_phone_prefix, 'unknown')
WHERE phone IS NULL;
Pattern 3: Conditional UPDATE with CASE
When you need to update multiple NULL columns in the same statement with different fallback logic per column, CASE earns its keep. One pass over the table, multiple column targets.
-- Multi-column NULL backfill in a single pass
UPDATE customers
SET phone = CASE WHEN phone IS NULL THEN 'unknown' ELSE phone END,
country = CASE WHEN country IS NULL THEN 'US' ELSE country END,
tier = CASE WHEN tier IS NULL AND signup_year >= 2024 THEN 'free'
WHEN tier IS NULL THEN 'legacy'
ELSE tier END
WHERE phone IS NULL
OR country IS NULL
OR tier IS NULL;
The WHERE clause is critical: without it, every row gets rewritten. The ELSE col END branch returns the existing value, so non-NULL rows that do get touched (because another column was NULL) keep their original data.
Why this beats three separate UPDATEs: one table scan instead of three, one transaction instead of three, one log entry per row instead of three. On a 5M-row table with three NULL backfills, this pattern roughly halves wall-clock time versus running three sequential UPDATEs.
If multi-column UPDATEs with mixed fallback logic are common in your workflow, AI2SQL's text-to-SQL generator writes the CASE skeleton for you — type the intent in plain English and it emits the dialect-correct statement with the predicate in the right place.
Pattern 4: UPDATE FROM (JOIN-based)
When the fallback isn't a literal but comes from another table, you join. PostgreSQL and SQL Server use UPDATE ... FROM; MySQL uses UPDATE ... JOIN. Same idea, different syntax.
PostgreSQL / SQL Server:
-- Pull missing phone numbers from a defaults table keyed by country
UPDATE customers AS c
SET phone = d.default_phone
FROM country_defaults AS d
WHERE c.country = d.country
AND c.phone IS NULL;
MySQL variant:
-- Same intent, MySQL JOIN syntax
UPDATE customers AS c
JOIN country_defaults AS d ON c.country = d.country
SET c.phone = d.default_phone
WHERE c.phone IS NULL;
When this wins: the fallback is config-driven (per-country defaults, per-tier policies, per-region timezones). Updating the country_defaults table later automatically updates how future backfills behave — no hard-coded literals scattered through migration scripts.
Caveat. If the join is not 1:1 — i.e. country_defaults has duplicate countries — PostgreSQL picks one row non-deterministically, SQL Server raises a runtime error in some versions, MySQL silently picks one. Always add a uniqueness check on the defaults table, or use SELECT DISTINCT / ROW_NUMBER() in a subquery to enforce one match per source row.
Pattern 5: Batched UPDATE for Large Tables
The pattern that separates production-safe migrations from tickets at 3am. Once row count crosses ~1M, a single UPDATE becomes a long-held lock and a transaction log spike. Batching breaks the work into chunks that each commit independently.
MySQL (LIMIT pattern):
-- Loop until zero rows updated; commit per batch
UPDATE customers
SET phone = 'unknown'
WHERE phone IS NULL
LIMIT 10000;
-- Repeat the statement (in a shell loop or stored procedure)
-- until ROW_COUNT() returns 0.
SQL Server (ROWCOUNT / TOP pattern):
-- Loop in T-SQL until no rows match
WHILE 1 = 1
BEGIN
UPDATE TOP (10000) customers
SET phone = 'unknown'
WHERE phone IS NULL;
IF @@ROWCOUNT = 0 BREAK;
-- Optional: WAITFOR DELAY '00:00:01' to ease replica lag
END;
PostgreSQL (CTE with primary-key filter):
-- Postgres has no LIMIT in UPDATE; use a CTE
WITH batch AS (
SELECT customer_id
FROM customers
WHERE phone IS NULL
LIMIT 10000
)
UPDATE customers
SET phone = 'unknown'
FROM batch
WHERE customers.customer_id = batch.customer_id;
-- Loop in psql or your migration runner until 0 rows.
Why batching matters. Real benchmark: a 5M-row UPDATE on PostgreSQL 16 with batched 10K chunks finishes in about 12 minutes total wall-clock and keeps the table writable throughout. The same UPDATE unbatched can lock the table for 40+ minutes, block every concurrent INSERT, and cause replication lag on read replicas as the WAL stream catches up. On SQL Server with replication enabled, an unbatched UPDATE of comparable size routinely triggers lag alerts on the secondary.
The batch size is a tradeoff: 1000 rows is overly cautious and triples the wall-clock vs 10K; 100K rows starts to look like an unbatched UPDATE again. 10K is a reasonable default; tune up or down based on your row width.
The AI Shortcut: Describe the Intent
Five patterns is a lot to keep in your head, especially when you're switching between MySQL at one client and PostgreSQL at another. The AI shortcut is to describe the intent in plain English and let the generator pick the pattern.
Before — a developer writing it by hand, second-guessing dialect rules:
"Wait, does PostgreSQL even support LIMIT inside UPDATE? I think it doesn't. Do I need a CTE? What was the syntax —WITH ... ASthenUPDATE ... FROM? And should I filter by primary key or by the IS NULL predicate again inside the CTE?"
After — same intent typed into AI2SQL:
"Set all NULL phone numbers to 'unknown' but only for rows older than 30 days. Postgres, the table has 8M rows, do it in batches of 10K."
AI2SQL emits:
WITH batch AS (
SELECT customer_id
FROM customers
WHERE phone IS NULL
AND created_at < NOW() - INTERVAL '30 days'
LIMIT 10000
)
UPDATE customers
SET phone = 'unknown'
FROM batch
WHERE customers.customer_id = batch.customer_id
RETURNING customer_id;
-- Re-run until RETURNING is empty.
The dialect picker handles the syntax difference between MySQL LIMIT, SQL Server TOP, and PostgreSQL CTE. The 30-day predicate maps onto the right time function for the dialect (NOW() - INTERVAL on Postgres, DATE_SUB(NOW(), INTERVAL 30 DAY) on MySQL, DATEADD(day, -30, GETDATE()) on SQL Server). The RETURNING clause is added so the migration runner can detect the empty result and break the loop.
This isn't a replacement for SQL knowledge — you still review what comes out, and you still need to know what IS NULL means to read the result. It's a typing-time saver for the parts that change between dialects. Try the AI shortcut on a 7-day free trial if you write UPDATEs across multiple databases.
Common Gotchas
Six NULL behaviors that trip even experienced developers, in roughly the order they bite during a real migration:
- NULL + arithmetic returns NULL.
SET total = subtotal + taxon a row wheretax IS NULLsetstotalto NULL, not tosubtotal. Wrap withCOALESCE(tax, 0)or your money column quietly becomes NULL too. - NOT IN with NULL is a trap.
WHERE id NOT IN (SELECT id FROM other WHERE ...)returns zero rows if the subquery contains a single NULL. UseNOT EXISTSor filter the subquery:WHERE id NOT IN (SELECT id FROM other WHERE id IS NOT NULL). - UNIQUE constraint + NULL. Most engines (PostgreSQL, SQL Server pre-2008 R2, MySQL InnoDB) treat NULL as "not equal to itself" for uniqueness, so a UNIQUE column allows multiple NULLs. SQL Server's filtered unique indexes and PostgreSQL's
UNIQUE NULLS NOT DISTINCT(PG15+) change this. If your UPDATE backfills a UNIQUE column, make sure you understand which rule applies. - Triggers firing on UPDATE. An UPDATE backfill on a table with audit triggers can fire millions of trigger executions, each writing to an audit log. The migration runs 5x longer than expected and the audit log grows in lockstep. Disable the trigger for the migration, or use a session variable that the trigger checks and skips on.
- Auditing UPDATE'd rows. If you use CDC (Debezium, Postgres logical replication, SQL Server CDC), a bulk NULL backfill emits one event per row. Downstream consumers queue up millions of "phone changed" events. Pause the consumer or annotate in an audit table.
- Replication lag. On MySQL row-based replication a 10M-row UPDATE produces 10M binlog events; the replica replays serially. Batch with small pauses (
WAITFOR DELAY,pg_sleep(1)) so replicas catch up.
Stop Looking Up UPDATE Syntax Per Dialect
Write the intent. Get the dialect-correct UPDATE.
AI2SQL emits the right pattern — plain UPDATE, COALESCE, CASE, JOIN-based, or batched — for MySQL, PostgreSQL, SQL Server, BigQuery, and Snowflake. You describe what should change; the generator picks the syntax. Includes schema-aware suggestions, history of past queries, and team sharing.
- Start — $5/mo, 50 queries/day
- Pro — $11/mo, 500 queries/day (most popular)
- Team — $23/mo, unlimited queries plus multi-user workspaces
7-day trial, credit card required. Cancel any time before day 7 and you're not charged.
Frequently Asked Questions
Should I update NULL or DELETE the row?
It depends on what the NULL represents. If NULL means "unknown but the row is still valid" (a customer with no recorded phone number, an order with no shipping date yet), update it to a sentinel value or leave it. If NULL means "this row should not exist" (an orphaned join row, a partial insert that never completed), DELETE is the right call. The decision is about row validity, not column emptiness. Never DELETE rows just to clean up NULLs in one column — you'll lose referenced data and break foreign keys.
Does updating NULL trigger transaction log growth?
Yes, and significantly more than people expect. Every row update writes the full before-and-after image to the transaction log (PostgreSQL WAL, SQL Server t-log, MySQL binlog if enabled). Updating 10 million NULL phone numbers to 'unknown' can balloon the log by several GB even though the column change looks tiny. On replicated systems this is also replication lag. The fix is batched UPDATE with explicit commits per chunk (see Pattern 5) so the log can be checkpointed and truncated incrementally.
What if the column has a NOT NULL constraint added later?
Sequence the migration in three steps: (1) backfill existing NULLs with a default using UPDATE WHERE col IS NULL, (2) set a column DEFAULT for new rows (ALTER TABLE ... ALTER COLUMN col SET DEFAULT 'x' in Postgres / ALTER TABLE ... ALTER COLUMN col SET DEFAULT in SQL Server), (3) finally add the NOT NULL constraint. In PostgreSQL 11+, adding NOT NULL with a DEFAULT no longer rewrites the whole table — it's a metadata-only change. In older versions or on MySQL, the ALTER may rewrite the table, so run it during a maintenance window.
Can I update NULLs across millions of rows without locking the table?
Yes, with batching. A single UPDATE on a 10M-row table takes a long-held exclusive lock on PostgreSQL or SQL Server, which blocks writes (and on SQL Server with row versioning off, reads too). Instead, loop UPDATE ... WHERE col IS NULL LIMIT 10000 (MySQL/PostgreSQL with primary-key filtering) or UPDATE TOP (10000) ... (SQL Server) until zero rows are affected. Each batch commits, releases locks, and lets other queries proceed. Production benchmark: a 5M-row UPDATE on PostgreSQL 16 with 10K-row batches finishes in about 12 minutes versus locking the table for 40+ minutes unbatched.
How does AI2SQL pick the right pattern for my UPDATE?
AI2SQL reads the question for signals: row count hints ("millions of rows", "large table") trigger the batched pattern with LIMIT or TOP; phrases like "where empty" or "fill in missing" map to COALESCE; multi-column intent ("phone and address") generates a single UPDATE with CASE expressions or multiple SET clauses; mentions of another table ("from the defaults table") produce UPDATE ... FROM (Postgres/SQL Server) or UPDATE ... JOIN (MySQL). The dialect picker on the page selects the right syntax variant. You still review the generated SQL — AI2SQL is the first draft, not the merge button.