Question 1

What exactly is a hash function?

Accepted Answer

A deterministic function that maps any-length input to a fixed-length output. Same input always returns the same output; any input change should produce an unpredictable output change. The formal properties are preimage resistance, second-preimage resistance, and collision resistance.

Question 2

Cryptographic vs non-cryptographic - what is the difference?

Accepted Answer

A cryptographic hash has a security goal against an adversary: finding a collision should be computationally infeasible. Non-cryptographic hashes (MurmurHash, xxHash, FNV) optimize for speed and distribution and assume no adversary. Use cryptographic hashes for integrity, signatures, and content addressing; non-cryptographic ones for hash tables, Bloom filters, and sketches.

Question 3

Which hash should I pick today?

Accepted Answer

General content addressing: SHA-256 or BLAKE3. Speed-critical: BLAKE3. Authentication tag: HMAC-SHA-256 or keyed BLAKE2b/BLAKE3. Password storage: Argon2id (scrypt if Argon2 is unavailable). Non-cryptographic in-memory: xxHash3 or SipHash for hash-flooding-safe. SNARK circuits: Poseidon/Poseidon2.

Question 4

Is SHA-256 broken?

Accepted Answer

No. SHA-256 has no practical collision or preimage attack. The only caveat for new designs is length-extension: H(key || message) leaks the ability to extend. Use HMAC-SHA-256, SHA-3, or BLAKE2/3 instead.

Question 5

Why is MD5 still around if it's broken?

Accepted Answer

Real-world software is huge and slow to migrate. MD5 collisions are trivial, so MD5 cannot be used where an attacker might submit inputs. For accidental corruption checks (file checksums on mirrors, package lockfiles) it still works. New systems should skip MD5; legacy systems should plan migration.

Question 6

Is SHA-1 truly dead?

Accepted Answer

For collision resistance, yes - chosen-prefix collisions are practical (Shambles, 2020). For preimage resistance it is still out of reach. HMAC-SHA-1 has no known practical attack and is still deployed in older TLS, OAuth, and OTP systems. New designs should pick SHA-2 or SHA-3.

Question 7

SHA-2 or SHA-3 - which one?

Accepted Answer

Both are standardized. SHA-2 is faster on most CPUs with hardware acceleration. SHA-3 has a different (sponge) construction, so an attack on SHA-2 would not also break SHA-3. SHA-256 is fine for most uses; pick SHA-3 for length-extension immunity without truncation or for post-quantum protocols using SHAKE.

Question 8

What's a salt and why do I need one?

Accepted Answer

A salt is a unique, non-secret random value mixed into a password hash so identical passwords produce different stored digests. Without a salt, an attacker can precompute hashes and match many users at once. Use 16 random bytes per user, stored next to the hash.

Question 9

Salt vs pepper?

Accepted Answer

Salt is unique per user, stored next to the hash, not secret. Pepper is a secret value shared across users, stored outside the database. If only the database is stolen, pepper keeps hashes uncrackable. If both leak, pepper provides no extra protection.

Question 10

What's the difference between HMAC and a hash with a key prepended?

Accepted Answer

For Merkle-Damgard hashes (MD5, SHA-1, SHA-2 non-truncated), H(key || message) is not a secure MAC - it leaks length-extension. HMAC nests two hash calls with two derived keys (K XOR ipad and K XOR opad), making the construction safe regardless.

Question 11

bcrypt or Argon2id?

Accepted Answer

Argon2id when you can. It is the PHC winner, has tunable memory cost (forcing GPU attackers to also commit GPU memory), and is RFC-standardized (RFC 9106). bcrypt remains a defensible legacy choice; do not migrate working bcrypt installations in a panic.

Question 12

Why does Bitcoin hash twice (SHA-256d)?

Accepted Answer

Two passes of SHA-256 block length-extension attacks against the block-header MAC and slightly harden against multi-collisions. The cost is essentially zero (each pass is nanoseconds). Some certificate-transparency protocols use the same trick.

Question 13

What is a Merkle tree?

Accepted Answer

A binary tree where each leaf is the hash of a data block and each internal node is the hash of its two children. The root commits to the entire data set in one digest. Proving one leaf is included needs only the sibling hashes along the path to the root. Used in Git, Bitcoin, certificate transparency, IPFS, BLAKE3.

Question 14

How does Git use SHA-1 - and is that a problem?

Accepted Answer

Every Git object is named by the SHA-1 of "type size\0content". Practical SHA-1 collisions exist, so a malicious party who controls both files could create indistinguishable objects. Git mitigates with libsha1dc (collision-detecting SHA-1). A SHA-256 object format is in slow rollout.

Question 15

What is a rainbow table?

Accepted Answer

A precomputed table that trades disk space for time when cracking password hashes. Defeated by salting (per-user salt makes precomputation useless). Modern password hashes (bcrypt, scrypt, Argon2) also use enough internal work that practical rainbow tables don't exist for them.

Question 16

Can quantum computers break hash functions?

Accepted Answer

Grover's algorithm gives a quadratic speedup for preimage attacks: n-bit preimage drops to ~2^(n/2) quantum work. SHA-256 still has ~128-bit quantum preimage security - out of reach. Hash functions are not broken by quantum computers in the catastrophic sense public-key crypto is. Double the output size (SHA-512 instead of SHA-256) for post-quantum margin.

Question 17

What is a perceptual hash?

Accepted Answer

An image (or audio, video) fingerprint that produces similar hashes for visually similar inputs. aHash, dHash, pHash, wHash are the common variants. Used for near-duplicate detection, reverse image search, content moderation. Not cryptographic - an attacker can construct images that hash to a target value.

Question 18

What is a SNARK-friendly hash?

Accepted Answer

A hash function designed to be cheap inside a zero-knowledge proof circuit, where each field multiplication is an R1CS constraint or custom gate. Poseidon, MiMC, Rescue-Prime, and Pedersen are common choices. Usually slower than SHA-256 outside a circuit but 10-100x cheaper inside one.

Question 19

Is xxHash secure?

Accepted Answer

No, and it does not claim to be. xxHash is a non-cryptographic hash designed for speed and distribution; differentials are findable by SAT solvers in seconds. Use it for hash tables, dedup of trusted inputs, or content checksums against accidental corruption - never for MACs, signed URLs, or content addressing against adversaries.

Question 20

Should I generate salts with Math.random()?

Accepted Answer

No. Use a cryptographically secure random source: crypto.getRandomValues in the browser, crypto.randomBytes in Node, /dev/urandom on Unix, os.urandom in Python. Math.random is predictable.

Question 21

How big should a salt be?

Accepted Answer

16 random bytes is the modern standard (Argon2 RFC 9106, scrypt recommendations, OWASP guidance). 8 bytes is acceptable for legacy compatibility. Anything bigger than 16 bytes is harmless but pointless.