Skip to content
yutils

How Database Transactions Actually Work

ACID, isolation levels, dirty/phantom reads, write skew, MVCC vs locking, SERIALIZABLE, deadlocks, and the retry-on-serialization-failure pattern explained.

~9 min read

A bank transfer — debit account A, credit account B. If the debit succeeds but the server dies before the credit, money vanishes. A transaction binds the two into "all succeed or all undo." Now run hundreds of these at once against the same rows and a new question appears: who gets to see whose half-finished changes? That's the isolation level. This guide covers what ACID really means, the four read anomalies, the four isolation levels, MVCC vs locking, the cost of SERIALIZABLE, and deadlocks.

What a transaction is — BEGIN / COMMIT / ROLLBACK

BEGIN;                              -- start a transaction
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;                             -- both made permanent together

-- If something goes wrong:
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  -- detected: insufficient funds
ROLLBACK;                           -- undo as if nothing happened

A transaction groups several SQL statements into one logical unit. Changes are (mostly) invisible to other transactions until COMMIT, and ROLLBACK throws them all away.

SAVEPOINT — partial rollback

BEGIN;
  INSERT INTO orders (id, user_id) VALUES (1, 42);
  SAVEPOINT after_order;
  INSERT INTO order_items (order_id, sku) VALUES (1, 'BAD');
  -- only this item failed → keep the order, drop the item
  ROLLBACK TO SAVEPOINT after_order;
  INSERT INTO order_items (order_id, sku) VALUES (1, 'GOOD');
COMMIT;

A SAVEPOINT is a marker inside a transaction. You can roll back to it and keep the rest. An ORM's "nested transaction" is usually a SAVEPOINT under the hood.

ACID — what the four letters really mean

  • Atomicity — every operation in the transaction either all commits or all aborts. No partial state survives. The point of the transfer example above.
  • Consistency — a transaction moves the database from one valid state to another valid state. But this is NOT a promise that your business rules are correct. The DB only enforces declared constraints (FK, UNIQUE, CHECK, NOT NULL). A rule like "debit and credit amounts must match" is your code's job. The C in ACID is the most misunderstood letter.
  • Isolation — how much of each other's unfinished work concurrent transactions can see. You tune it with the isolation level. Half this guide is about this.
  • Durability — once COMMIT returns, the change survives a power loss. Most engines guarantee this by fsyncing a WAL (write-ahead log) to disk before returning from COMMIT.

In short — A and D are about "single-transaction reliability," C is about "constraints + your code," and I is about "how concurrency is handled." Most real-world bugs come from misunderstanding I.

Read anomalies — what weak isolation lets through

Abnormal results when two transactions T1 / T2 interleave. Each is shown as a 2-transaction interleaving.

1. Dirty read — reading uncommitted data

T1: BEGIN
T1: UPDATE accounts SET balance = 0 WHERE id = 1;  -- not committed yet
T2: BEGIN
T2: SELECT balance FROM accounts WHERE id = 1;      -- reads 0 (!)
T1: ROLLBACK                                         -- it was never 0
T2: -- decided based on a value that never existed

T2 read T1's unfinished, soon-to-be-rolled-back change. Never happens at Read Committed or above.

2. Non-repeatable read — same row, two different values

T1: BEGIN
T1: SELECT balance FROM accounts WHERE id = 1;  -- 100
T2: BEGIN; UPDATE accounts SET balance = 50 WHERE id = 1; COMMIT;
T1: SELECT balance FROM accounts WHERE id = 1;  -- 50 (same query, new value!)

T1 read the same row twice within one transaction and the value changed. Prevented at Repeatable Read and above.

3. Phantom read — the set of rows changes

T1: BEGIN
T1: SELECT count(*) FROM orders WHERE user_id = 42;  -- 3 rows
T2: BEGIN; INSERT INTO orders (user_id) VALUES (42); COMMIT;
T1: SELECT count(*) FROM orders WHERE user_id = 42;  -- 4 rows (a phantom appeared!)

Existing row values are unchanged, but a new row matching the predicate slipped in. Non-repeatable read is "a row's value changed"; a phantom is "the set of matching rows changed." Prevented at Serializable (and, in practice, by PostgreSQL's Repeatable Read too).

4. Write skew — each is right, together they're wrong

-- Rule: at least 1 doctor must stay on call (currently 2: Alice, Bob)
T1 (Alice): SELECT count(*) FROM doctors WHERE on_call = true;  -- 2, OK
T2 (Bob):   SELECT count(*) FROM doctors WHERE on_call = true;  -- 2, OK
T1: UPDATE doctors SET on_call = false WHERE name = 'Alice';
T2: UPDATE doctors SET on_call = false WHERE name = 'Bob';
T1: COMMIT;  T2: COMMIT;
-- Result: 0 on call. Both reasoned "one remains even if I leave."

Each transaction, viewed alone, obeyed the rule. By reading different rows and writing different rows, together they break a global invariant. Snapshot isolation can't catch this; only Serializable (e.g. PostgreSQL SSI) does. Write skew is the most dangerous anomaly in any isolation discussion — and the most often overlooked.

The four isolation levels

The four levels defined by the SQL standard. Higher = stronger isolation, lower concurrency.

Level              | Dirty Read | Non-repeatable | Phantom | Write Skew
-------------------+------------+----------------+---------+-----------
Read Uncommitted   |  allowed   |  allowed       |  allowed|  allowed
Read Committed     |  prevented |  allowed       |  allowed|  allowed
Repeatable Read    |  prevented |  prevented     |  allowed*|  allowed
Serializable       |  prevented |  prevented     |  prevented|  prevented

* Per the SQL standard, Repeatable Read permits phantoms. But
  PostgreSQL's Repeatable Read (= snapshot isolation) blocks phantoms too.
-- session level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- per transaction (PostgreSQL)
BEGIN ISOLATION LEVEL REPEATABLE READ;
  ...
COMMIT;

Caution — the same name is implemented differently per engine. PostgreSQL's Repeatable Read is stronger than the standard (blocks phantoms); MySQL InnoDB's Repeatable Read uses gap locks to block phantoms but still allows write skew. Judge by "what does this engine actually prevent," not by the name.

Locking vs MVCC

There are two big strategies for implementing isolation.

Lock-based (2PL — two-phase locking)

Phase 1 (Growing): only acquire locks
Phase 2 (Shrinking): only release locks (usually all at COMMIT)

- reads take a shared lock (S), writes an exclusive lock (X)
- S coexists with S; X is exclusive against everything
- nobody writes while you read; nobody reads while you write
→ correct, but readers and writers block each other → low concurrency

MVCC (Multi-Version Concurrency Control) — PostgreSQL's approach

PostgreSQL does not overwrite rows in place. An UPDATE adds a new version of the row and leaves the old one. Each row version carries two hidden system columns:

xmin = transaction id that created this row version
xmax = transaction id that deleted/superseded it (0 if still live)

Right after UPDATE accounts SET balance = 50 WHERE id = 1;
+--------+---------+------+------+
| id     | balance | xmin | xmax |
+--------+---------+------+------+
| 1      | 100     | 100  | 200  |  ← old version (txn 200 superseded it)
| 1      | 50      | 200  | 0    |  ← new version
+--------+---------+------+------+

Each transaction holds a snapshot taken at its start (the set of transaction ids committed at that moment) and only reads the versions visible to that snapshot. So:

  • readers don't block writers and writers don't block readers. Everyone reads their own version. High concurrency.
  • dirty reads are structurally impossible — an uncommitted version is in nobody's snapshot.
  • the catch — old row versions pile up. Cleaning them is VACUUM: it reclaims dead tuples that no live transaction can see anymore and frees the space for reuse. If VACUUM falls behind, tables bloat and slow down. Autovacuum runs this automatically.

MySQL InnoDB also uses MVCC, but keeps old versions in a separate undo log (PostgreSQL keeps them in the table) and a purge thread cleans them. That's why InnoDB doesn't suffer table bloat as visibly as PostgreSQL.

SERIALIZABLE — two implementations

The strongest isolation. It guarantees "the result is identical to running the transactions one at a time in some serial order," even though they ran concurrently. Two main implementations:

Strict 2PL serializable

  • shared lock on every read, exclusive lock on every write, plus predicate / gap locks on ranges to block phantoms with locks.
  • correct, but heavy lock contention and frequent deadlocks. MySQL InnoDB's SERIALIZABLE is roughly this family (every SELECT locks implicitly).

PostgreSQL SSI — Serializable Snapshot Isolation

SSI doesn't block with locks. It runs optimistically and detects:
1. every transaction runs under snapshot isolation (reads don't block)
2. the DB tracks read/write dependencies (rw-conflicts) between txns
3. if it spots a dangerous cycle that can't map to any serial order
4. it ABORTS one of the transactions → the app retries

Upside: reads don't block writes, keeping MVCC's high concurrency
Cost: on a conflict like write skew, some transaction is force-aborted

So under PostgreSQL SERIALIZABLE it's usually fast, but on a conflict a transaction can fail with ERROR: could not serialize access due to read/write dependencies (SQLSTATE 40001). This is normal, and the application must retry (see the pattern below). Lock-based blocks and waits; SSI lets you proceed and undoes a few — same guarantee, different cost structure.

Deadlocks — each waiting on the other

T1: BEGIN; UPDATE accounts SET ... WHERE id = 1;  -- locks row 1
T2: BEGIN; UPDATE accounts SET ... WHERE id = 2;  -- locks row 2
T1: UPDATE accounts SET ... WHERE id = 2;  -- waits on row 2 (held by T2)
T2: UPDATE accounts SET ... WHERE id = 1;  -- waits on row 1 (held by T1)
-- they wait on each other forever = deadlock

Detection and resolution — the DB periodically inspects the wait-for graph. When it finds a cycle, it picks one transaction as the victim and aborts it:

ERROR: deadlock detected
DETAIL: Process 123 waits for ShareLock on transaction 456;
        Process 456 waits for ShareLock on transaction 123.
-- the victim transaction is rolled back → the app retries

Avoidance — consistent lock ordering. The root cause above is that T1 locked 1 then 2, while T2 locked 2 then 1. If all code always locks rows in the same order — say ascending id — no cycle can form:

-- always lock the smaller id first
def transfer(from_id, to_id, amount):
    a, b = sorted([from_id, to_id])
    SELECT ... FROM accounts WHERE id = a FOR UPDATE;
    SELECT ... FROM accounts WHERE id = b FOR UPDATE;
    ... update balances ...

Also — keep transactions short, lock only the rows you need, and index your foreign keys so locks don't widen unnecessarily. You can't fully eliminate deadlocks, so always keep retry logic.

Practical — defaults and costs

Default isolation level differs per engine

  • PostgreSQL — defaults to Read Committed. Each statement takes a fresh snapshot at its start.
  • MySQL InnoDB — defaults to Repeatable Read. The snapshot is taken at transaction start and held throughout.
  • Oracle / SQL Server — yet other defaults and implementations. Behavior shifts subtly when you port between them.

So "code that worked fine on PostgreSQL" can behave differently on MySQL. Don't write transaction code without knowing the default.

The "just use SERIALIZABLE" trap

It does prevent every anomaly. But it isn't free:

  • lock-based — more contention and deadlocks, lower throughput.
  • SSI — more serialization-failure (40001) aborts, so the app must implement retries. Skip that and the error leaks straight to users.
  • most workloads are fine with Read Committed plus explicit locks (SELECT ... FOR UPDATE) on the critical section, or a higher level only where it's needed.

The retry-on-serialization-failure pattern

def run_serializable(work, max_retries=5):
    for attempt in range(max_retries):
        try:
            BEGIN ISOLATION LEVEL SERIALIZABLE;
            work()            # all reads/writes
            COMMIT;
            return
        except SerializationFailure as e:  # SQLSTATE 40001 or 40P01
            ROLLBACK;
            sleep(backoff(attempt))         # backoff with a little jitter
            continue
    raise TooManyRetries()

Key point — the retried function must be free of side effects (idempotent). Touch the DB only inside the transaction, and defer external effects like sending an email until after COMMIT. Both SQLSTATE 40001 (serialization failure) and 40P01 (deadlock) are retryable.

Related guides

How indexes affect lock scope and the chance of deadlock is covered in how-database-indexes-work (a missing index on a foreign key widens the locks). And how COMMIT's durability guarantee gets shaky once it spreads across nodes — why a replica sees a transaction late — continues in how-replication-actually-works.

References

Summary

  • A transaction = an "all-or-nothing" unit wrapped in BEGIN / COMMIT / ROLLBACK. SAVEPOINT enables partial rollback.
  • ACID — A and D are single-transaction reliability, C is constraints + your code (not a business-rule guarantee), I is concurrency. Bugs usually come from misreading I.
  • Anomalies — dirty / non-repeatable / phantom / write skew. Write skew survives snapshot isolation and only Serializable catches it.
  • Four levels — Read Uncommitted → Committed → Repeatable Read → Serializable. The same name prevents different things per engine.
  • PostgreSQL uses MVCC (xmin/xmax versions + snapshots, hence VACUUM); the classic alternative is lock-based 2PL.
  • SERIALIZABLE — strict 2PL (block and wait) vs PostgreSQL SSI (proceed, abort on conflict). SSI can fail with 40001 → retry.
  • Deadlocks are detected by the DB, which aborts one side. Avoid them with consistent lock ordering (e.g. ascending id).
  • Defaults differ (Postgres = Read Committed, MySQL = Repeatable Read). "Just use SERIALIZABLE" has a cost — implement an idempotent retry-on-40001.
Back to guides