A cryptographic hash like SHA-256 takes any input and produces a fixed-length "fingerprint" (256 bits = 64 hex chars). The same input always returns the same fingerprint; flip one bit and the fingerprint changes completely. This guide walks through how SHA actually builds that fingerprint, why you can't reverse it, why MD5/SHA-1 are called "broken," and why fast hashes are wrong for password storage.
The four promises of a hash function
- Deterministic — same input → same output. Every time. On any system.
- Fixed-length output — 1 byte input or 1 GB input, the digest is the same size (SHA-256 = 32 bytes).
- Pre-image resistance — you can't recover the original input from the digest.
- Collision resistance — finding two different inputs that hash to the same digest is hard.
Try it: put the same string into SHA Hash twice — identical hex. Change one character — the result is completely different (avalanche effect).
SHA-2 — the Merkle-Damgård construction
SHA-224 / SHA-256 / SHA-384 / SHA-512 all use Merkle-Damgård. The skeleton:
1. Pad the input (encode message length in the final bits)
2. Split into fixed-size blocks (512 or 1024 bits)
3. Initial hash state (IV — 8 fixed 32-bit integers)
4. For each block, apply the compression function:
state = compress(state, block)
5. Final state is the hash outputSHA-256's compress() has 64 rounds. Each round mixes bits via rotations, XORs, modular additions, and AND/OR. The operations look simple in isolation, but stacked together they flip on average 128 output bits per 1 input bit changed — the avalanche effect.
SHA-3 — a completely different "sponge"
In 2007 NIST opened a competition for a hash with a different design from SHA-2. Keccak won in 2012 and became SHA-3. Its core idea is the sponge:
Absorb:
state (1600 bits) = 0
for each input block:
XOR block into part of state
apply permutation (Keccak-f)
Squeeze:
output = first slice of state
if more output needed:
apply permutation again
take next sliceThe sponge structurally avoids the length-extension weakness of Merkle-Damgård. With SHA-3 even hash(secret || data) is safe (with SHA-2 it isn't — that's why HMAC exists).
SHA-2 vs SHA-3 vs SHAKE — what to pick
| Name | Bits | Construction | Use |
|---|---|---|---|
| SHA-1 | 160 | Merkle-Damgård | ❌ Broken (collisions practical since 2017) |
| SHA-256 | 256 | Merkle-Damgård | ✅ Most common (TLS, Git, Bitcoin) |
| SHA-512 | 512 | Merkle-Damgård | ✅ Can outpace SHA-256 on 64-bit CPUs |
| SHA3-256 | 256 | Sponge | ✅ SHA-2 alternative, length-extension safe |
| SHAKE-128/256 | variable | Sponge | ✅ Variable-length output (KDF-ish use) |
For new systems the default recommendation is SHA-256 — broadest library and hardware-acceleration support. If you want defense-in-depth against future SHA-2 cryptanalysis, SHA3-256. SHAKE is niche (variable output / KDF building blocks).
Why MD5 and SHA-1 are "broken"
"Broken" means an attacker can deliberately produce a collision — two different inputs hashing to the same digest.
- MD5 collisions: Wang et al. published a practical algorithm in 2004. Modern laptops produce collisions in seconds. The Flame malware (2012) used an MD5 collision to forge a Microsoft certificate.
- SHA-1 collisions: Google + CWI released SHAttered in 2017 — two PDFs with the same SHA-1 hash. The attack cost ~9.2 × 10^18 SHA-1 computations (~6,500 single-GPU years), but feasible on Google's infrastructure.
Never use MD5 / SHA-1 for crypto or digital signatures. Non-crypto uses (checksums, cache keys) are still fine — no deliberate-collision risk there.
Why you shouldn't store passwords with SHA
SHA-256 is intentionally fast. Modern GPUs compute billions of SHA-256 hashes per second. An 8-char alphanumeric password (~218 trillion combinations) can be brute-forced in hours to days.
The fix:
- Intentionally slow hashes — bcrypt / Argon2 / scrypt. A "cost" parameter (work factor) makes a single hash take ~100 ms. The brute-force attack slows down by ~5 orders of magnitude.
- Salt — a per-user random value mixed in. Defeats rainbow tables.
- Pepper (optional) — a global secret stored separately from the DB. A DB dump alone isn't enough.
Compare hands-on: SHA Hash on a 1 KB input takes under 1 ms. Bcrypt Hash at cost 10 takes ~100 ms on the same input (~10⁵× slower). See the password-hashing guide for the deeper comparison.
HMAC — combining a key and a hash safely
When you need a MAC (message authentication code) from a secret key + data, the naïve SHA256(secret || data) is unsafe due to SHA-2's length extension. HMAC sidesteps it:
HMAC(key, msg) =
SHA256(
(key XOR opad) || SHA256((key XOR ipad) || msg)
)
opad = 0x5c repeated
ipad = 0x36 repeatedTwo SHA calls plus two XORs. Try it with HMAC Generator; the verifier pair is HMAC Verify.
Where SHA shows up
File integrity
The sha256sums.txt next to an ISO download. If your local SHA-256 matches, the file wasn't tampered with in transit.
Git commit IDs
Git uses SHA-1 of a commit as its ID (migration to SHA-256 in progress). The same commit produces the same ID everywhere.
Bitcoin / blockchains
Double-SHA-256 of block headers + the proof-of-work target. "Find a nonce so the resulting hash starts with N zeros."
Content-addressable storage
IPFS / Docker image layers / npm package integrity (Subresource Integrity). The content's hash is its address.
References
- NIST FIPS 180-4 (SHA-2) — official spec
- NIST FIPS 202 (SHA-3 / Keccak) — official spec
- SHAttered (SHA-1 collision, 2017) — shattered.io
- RFC 2104 (HMAC) — datatracker
Summary
- A hash promises four things: deterministic, fixed-length, one-way, collision-resistant.
- SHA-2 uses Merkle-Damgård; SHA-3 uses a sponge. Different structures, different weaknesses (e.g. length extension).
- MD5 / SHA-1 are broken for crypto (deliberate collisions). Fine for non-crypto checksums.
- Never store passwords with plain SHA — use bcrypt / Argon2 / scrypt with a tuned cost factor.
- Need a keyed MAC? Use HMAC, not naïve concatenation.
- Play with the building blocks: SHA Hash, MD5 Hash, HMAC Generator, Bcrypt Hash.