Hash Lab

What we don’t know

Open problems

A curated list of open research questions in hash function design and cryptanalysis. Each one is labelled by approximate difficulty: Approachable for a strong masters/PhD project, Research-grade for a multi-year line of work, Long-game for problems that may take a generation.

If you are looking for an open question to attack, this page is a finger pointing at where the open ground is.

ApproachableResearch-gradeLong game

Cryptanalysis of standardized hashes

These are the largest, most-studied targets. Any progress would reshape the field.

  • Find any SHA-256 collision

    Long game

    No collision attack better than the 2128 generic birthday bound is known. Even reduced-round attacks plateau around half the rounds (~32 of 64) with no clear path to extending. A practical full-SHA-256 collision would be a once-in-a-generation result and would force a global migration roughly comparable to the SHA-1 retirement.

  • Improve preimage bounds on full SHA-3

    Research-grade

    Best published preimage attacks cover ~5 of 24 Keccak-f rounds. Closing the gap (even partially, say to 8-10 rounds) would be a top-tier result and would clarify how much margin SHA-3 actually has.

  • Find a BLAKE3 distinguisher

    Research-grade

    BLAKE3’s round count (7) is aggressive even after accounting for the tree structure. No distinguisher better than generic is known on full BLAKE3, but several papers note that future cryptanalysis could narrow this margin. A real distinguisher would invite a round-count bump.

  • Concrete cost of multicollisions in modern Merkle-Damgård hashes

    Research-grade

    Joux’s 2004 multicollision result shows that finding k-way collisions in an iterated hash costs roughly k · 2n/2, not k·2n/2(k!). Concrete time-memory bounds for SHA-256, SHA-512, BLAKE2b under real adversaries (with GPU/ASIC budgets) are surprisingly open.

SNARK-friendly hash design

Algebraic hashes are barely a decade old. The race between new designs and algebraic-attack cryptanalysis is the most active part of symmetric crypto today.

  • Smaller circuit cost for the same security level

    Research-grade

    Poseidon, Poseidon2, MiMC, Rescue, Griffin, Anemoi keep trimming constraints. The lower bound for “circuit cost per security bit” is unknown. A 2-3× reduction would be world-changing for ZK rollups.

  • Provable bounds against Gröbner-basis attacks

    Long game

    Algebraic attacks (Gröbner bases, FreeLunch, etc.) keep nibbling at SNARK-friendly hashes. Most security arguments are based on best-known attacks plus a margin. Tight upper bounds on polynomial-system solving cost as a function of hash structure are mostly conjectural.

  • Standardize at least one SNARK-friendly hash

    Approachable

    Despite production usage in billions of dollars of ZK rollups, no SNARK-friendly hash is in NIST / IETF / ISO. The standardization effort would require pinning a specific field, parameter set, and target proof system , all of which are still moving.

Memory-hard functions

Argon2id is the current champion but the theory under it is incomplete.

  • Provable memory-hardness against time-memory trade-offs

    Long game

    Argon2id’s security argument is partly empirical. A clean mathematical proof that no polynomial-time tradeoff exists in the data-dependent phase , under any reasonable model of computation , is an open problem.

  • Resistance against custom ASICs at fixed RAM cost

    Research-grade

    Modern GPUs and FPGAs have GB-scale memory. A 1-GiB Argon2id is inconvenient for them but not prohibitive. Designing a memory-hard function whose “dollar cost per password attempt” scales linearly with all reasonable adversary architectures is open.

Post-quantum hash-based signatures

FIPS 205 (SLH-DSA / SPHINCS+) is standardized but slow. Improvements without compromising the security floor are an active area.

  • Reduce SLH-DSA signature sizes

    Research-grade

    SLH-DSA signatures range from 8 KiB (SLH-DSA-128f) to ~50 KiB (256s) , orders of magnitude bigger than Dilithium or classical ECDSA. Compressing them without sacrificing the hash-only security floor is an open question.

  • Faster Winternitz parameter selection

    Approachable

    Hash-based signatures spend their time in Winternitz one-time signatures. Picking the best w for given verify / sign / size trade-offs is well-understood theoretically but the constant-time engineering is non-trivial.

Perceptual & similarity hashing

Adversarial robustness is the elephant in this room.

  • Adversarially-robust perceptual hash for CSAM detection

    Long game

    Every published perceptual hash (PhotoDNA, NeuralHash, pHash, blockhash) has been broken in the adversarial sense. Designing an image fingerprint that is robust to deliberate evasion (re-encoding, small visible perturbations) and to targeted preimage attacks (forcing a benign image to collide with a known bad hash) is open.

  • Provably-private LSH against membership inference

    Research-grade

    LSH indexes leak which documents are similar. For applications with sensitive corpora (medical records, legal documents), LSH indexes that satisfy a privacy notion (differential privacy or strong indistinguishability) without destroying retrieval quality are still rough.

Implementation & engineering

These are the unglamorous open problems whose solutions actually ship in shipped systems.

  • Formally-verified hash implementations beyond HACL*

    Approachable

    HACL* covers a handful of standard hashes (SHA-2, SHA-3, BLAKE2) with full F* proofs. Extending the coverage to BLAKE3, Poseidon, and modern KDFs while keeping the verification effort tractable is an open engineering goal.

  • Constant-time bcrypt without ABI breakage

    Approachable

    Reference bcrypt has data-dependent table lookups. Constant-time bcrypt implementations exist but they differ subtly in cost from the reference, complicating “same hash with cost N” benchmarks across libraries.

  • Side-channel resistance for sponge hashes on small devices

    Research-grade

    SHA-3 / SHAKE on 8-bit microcontrollers leak via cache timing and electromagnetic emissions in known ways. Constant-time and constant-EM implementations exist for AES; comparable engineering for Keccak-f[1600] on resource-constrained hardware is still developing.

Standardization & ecosystem

Less technical, more diplomatic, but still genuinely open.

  • An IETF / NIST track for SNARK-friendly hashes

    Approachable

    Production deployment is enormous; standardization is zero. The governance question of whose Poseidon parameters become the standard , and how they get chosen , is open.

  • Open data on long-term hash function failure modes

    Approachable

    We collectively have a generation’s worth of hash function cryptanalysis. A curated, open dataset of (algorithm, attack, date, cost, references) suitable for trend analysis does not exist. Compiling one would let the field empirically calibrate its expectations.

Missing a problem you think belongs here? Open an issue or a pull request. This page is meant to grow over time, and a healthy version of the field should disagree about what is most important.

See also: the timeline for how we got here, and the references for the primary literature.