How SHA Hashes Are Built

A cryptographic hash like SHA-256 takes any input and produces a fixed-length "fingerprint" (256 bits = 64 hex chars). The same input always returns the same fingerprint; flip one bit and the fingerprint changes completely. This guide walks through how SHA actually builds that fingerprint, why you can't reverse it, why MD5/SHA-1 are called "broken," and why fast hashes are wrong for password storage.

The four promises of a hash function

Deterministic — same input → same output. Every time. On any system.
Fixed-length output — 1 byte input or 1 GB input, the digest is the same size (SHA-256 = 32 bytes).
Pre-image resistance — you can't recover the original input from the digest.
Collision resistance — finding two different inputs that hash to the same digest is hard.

Try it: put the same string into SHA Hash twice — identical hex. Change one character — the result is completely different (avalanche effect).

SHA-2 — the Merkle-Damgård construction

SHA-224 / SHA-256 / SHA-384 / SHA-512 all use Merkle-Damgård. The skeleton:

1. Pad the input (encode message length in the final bits)
2. Split into fixed-size blocks (512 or 1024 bits)
3. Initial hash state (IV — 8 fixed 32-bit integers)
4. For each block, apply the compression function:
   state = compress(state, block)
5. Final state is the hash output

SHA-256's compress() has 64 rounds. Each round mixes bits via rotations, XORs, modular additions, and AND/OR. The operations look simple in isolation, but stacked together they flip on average 128 output bits per 1 input bit changed — the avalanche effect.

SHA-3 — a completely different "sponge"

In 2007 NIST opened a competition for a hash with a different design from SHA-2. Keccak won in 2012 and became SHA-3. Its core idea is the sponge:

Absorb:
  state (1600 bits) = 0
  for each input block:
    XOR block into part of state
    apply permutation (Keccak-f)

Squeeze:
  output = first slice of state
  if more output needed:
    apply permutation again
    take next slice

The sponge structurally avoids the length-extension weakness of Merkle-Damgård. With SHA-3 even hash(secret || data) is safe (with SHA-2 it isn't — that's why HMAC exists).

SHA-2 vs SHA-3 vs SHAKE — what to pick

Name	Bits	Construction	Use
SHA-1	160	Merkle-Damgård	❌ Broken (collisions practical since 2017)
SHA-256	256	Merkle-Damgård	✅ Most common (TLS, Git, Bitcoin)
SHA-512	512	Merkle-Damgård	✅ Can outpace SHA-256 on 64-bit CPUs
SHA3-256	256	Sponge	✅ SHA-2 alternative, length-extension safe
SHAKE-128/256	variable	Sponge	✅ Variable-length output (KDF-ish use)

For new systems the default recommendation is SHA-256 — broadest library and hardware-acceleration support. If you want defense-in-depth against future SHA-2 cryptanalysis, SHA3-256. SHAKE is niche (variable output / KDF building blocks).

Why MD5 and SHA-1 are "broken"

"Broken" means an attacker can deliberately produce a collision — two different inputs hashing to the same digest.

MD5 collisions: Wang et al. published a practical algorithm in 2004. Modern laptops produce collisions in seconds. The Flame malware (2012) used an MD5 collision to forge a Microsoft certificate.
SHA-1 collisions: Google + CWI released SHAttered in 2017 — two PDFs with the same SHA-1 hash. The attack cost ~9.2 × 10^18 SHA-1 computations (~6,500 single-GPU years), but feasible on Google's infrastructure.

Never use MD5 / SHA-1 for crypto or digital signatures. Non-crypto uses (checksums, cache keys) are still fine — no deliberate-collision risk there.

Why you shouldn't store passwords with SHA

SHA-256 is intentionally fast. Modern GPUs compute billions of SHA-256 hashes per second. An 8-char alphanumeric password (~218 trillion combinations) can be brute-forced in hours to days.

The fix:

Intentionally slow hashes — bcrypt / Argon2 / scrypt. A "cost" parameter (work factor) makes a single hash take ~100 ms. The brute-force attack slows down by ~5 orders of magnitude.
Salt — a per-user random value mixed in. Defeats rainbow tables.
Pepper (optional) — a global secret stored separately from the DB. A DB dump alone isn't enough.

Compare hands-on: SHA Hash on a 1 KB input takes under 1 ms. Bcrypt Hash at cost 10 takes ~100 ms on the same input (~10⁵× slower). See the password-hashing guide for the deeper comparison.

HMAC — combining a key and a hash safely

When you need a MAC (message authentication code) from a secret key + data, the naïve SHA256(secret || data) is unsafe due to SHA-2's length extension. HMAC sidesteps it:

HMAC(key, msg) =
  SHA256(
    (key XOR opad) || SHA256((key XOR ipad) || msg)
  )

opad = 0x5c repeated
ipad = 0x36 repeated

Two SHA calls plus two XORs. Try it with HMAC Generator; the verifier pair is HMAC Verify.

Where SHA shows up

File integrity

The sha256sums.txt next to an ISO download. If your local SHA-256 matches, the file wasn't tampered with in transit.

Git commit IDs

Git uses SHA-1 of a commit as its ID (migration to SHA-256 in progress). The same commit produces the same ID everywhere.

Bitcoin / blockchains

Double-SHA-256 of block headers + the proof-of-work target. "Find a nonce so the resulting hash starts with N zeros."

Content-addressable storage

IPFS / Docker image layers / npm package integrity (Subresource Integrity). The content's hash is its address.

References

NIST FIPS 180-4 (SHA-2) — official spec
NIST FIPS 202 (SHA-3 / Keccak) — official spec
SHAttered (SHA-1 collision, 2017) — shattered.io
RFC 2104 (HMAC) — datatracker

Summary

A hash promises four things: deterministic, fixed-length, one-way, collision-resistant.
SHA-2 uses Merkle-Damgård; SHA-3 uses a sponge. Different structures, different weaknesses (e.g. length extension).
MD5 / SHA-1 are broken for crypto (deliberate collisions). Fine for non-crypto checksums.
Never store passwords with plain SHA — use bcrypt / Argon2 / scrypt with a tuned cost factor.
Need a keyed MAC? Use HMAC, not naïve concatenation.
Play with the building blocks: SHA Hash, MD5 Hash, HMAC Generator, Bcrypt Hash.