Cryptographic · broken
MD5
A 128-bit hash function designed by Ronald Rivest in 1992. Once the default checksum of the internet, MD5 is now cryptographically broken , collisions can be found in seconds on a laptop, and chosen-prefix collisions were demonstrated as far back as 2008 (Stevens et al.).
At a glance
| Output | 128 bits (16 bytes, 32 hex chars) |
|---|---|
| Block size | 512 bits |
| Construction | Merkle-Damgård |
| Rounds | 64 (4 × 16) |
| Standard | RFC 1321 (1992); RFC 6151 (2011, deprecation guidance) |
| Collision security | seconds, in practice (Wang & Yu 2004) |
| Preimage security | ~2123 (Sasaki & Aoki 2009) |
| Length extension | Yes |
| Status | Broken , do not use for new designs |
Where it still shows up
- File checksums ,
md5sumoutput remains common, e.g. on mirror sites and in package lockfiles, though only as a transport integrity check. - Legacy TLS / signatures , long deprecated; mostly only an interop concern with very old systems.
- HMAC-MD5 , was used in early TLS / IPsec. HMAC remains surprisingly resistant even with a broken hash, but should not be chosen for new designs.
- Non-security uses , cache keys, content addressing where adversarial collisions are irrelevant.
Why it failed
Hans Dobbertin found weaknesses in the compression function in 1996, and a full collision attack by Wang & Yu in 2004 reduced collision finding to about 239 work , achievable in seconds. Marc Stevens then demonstrated chosen-prefix collisions of practical impact: a forged X.509 certificate in 2008, and the Flame malware campaign in 2012 abused chosen-prefix MD5 collisions to forge a Windows code-signing certificate.
The root cause is differential cryptanalysis: an attacker can carefully choose input differences that propagate through the rounds with unusually high probability, producing two messages that collide. The design simply does not mix bits fast enough across rounds to suppress this.
Length-extension caveat
Like all Merkle-Damgård hashes, MD5 is vulnerable to length-extension. Given MD5(K || m) and the length of K || m, an attacker can compute MD5(K || m || pad || m') for any chosen m' without knowing K. Use HMAC-MD5 if you must use MD5 for authentication , better, use HMAC-SHA-256 and retire MD5.
Try it
The multi-algorithm hasher computes MD5 alongside modern alternatives so you can see how the same input maps differently across families.
References
- RFC 1321 , The MD5 Message-Digest Algorithm (Rivest, 1992)
- RFC 6151 , Updated Security Considerations for MD5 and HMAC-MD5
- Wang & Yu, “How to Break MD5 and Other Hash Functions” (EUROCRYPT 2005)
- Stevens, Lenstra, de Weger , Chosen-prefix collisions for MD5 and applications (2009)
- Flame malware (2012) , chosen-prefix MD5 collision in the wild
Visualize
MD5 on your input
11 bytes · 0-bit digest
Hex digest
Bit grid (0 bits, teal = 1, slate = 0)
Byte pixel art (0 bytes, hue = byte value mod 360°)
Avalanche , flipping the lowest bit of the first input byte changed 0 of 0 output bits
Quick quiz
Test yourself on md5
10 multiple-choice questions. Pick an answer for each, then submit to see explanations.
Q1.What is the output size of MD5?
Q2.Who designed MD5?
Q3.What construction does MD5 use?
Q4.Which year's CRYPTO paper made MD5 collisions practical?
Q5.Which malware famously used a chosen-prefix MD5 collision to forge a Microsoft code-signing certificate?
Q6.Is MD5 still safe as a MAC if you use HMAC-MD5?
Q7.How many bytes long is an MD5 digest when encoded as hexadecimal?
Q8.MD5 has what kind of length-extension property?
Q9.Which RFC formally deprecates MD5 for new uses?
Q10.Which use of MD5 is still considered acceptable today?