Hash Lab

Non-cryptographic

BuzHash

Designed by Robert Uzgalis (1992). A cyclic XOR rolling hash that slides over a window of bytes with constant per-byte cost. Used anywhere “hash of the last N bytes” needs to update in O(1) per byte , deduplicating backup tools, content-defined chunking, and rsync-style protocols.

The construction

Pick a random permutation table T[256] mapping bytes to w-bit values. Maintain a sliding hash h over a window of n bytes:

add byte b:    h = rotl(h, 1) XOR T[b]
remove byte b: h = h XOR rotl(T[b], n)

Both add and remove are O(1) per byte. Total cost over an entire file is O(file_size), regardless of window size.

At a glance

Output32 or 64 bits (variant choice)
Window sizeConfigurable (typically 32-128 bytes)
DesignerRobert Uzgalis (1992)
Throughput5+ GiB/s on byte loops
StatusNon-cryptographic; rolling hash for content-defined chunking

Where it shows up

Why rolling matters for dedup

Fixed-size chunking is brittle: a 1-byte insertion shifts every subsequent chunk boundary, defeating dedup. Content-defined chunking (Rabin fingerprints, BuzHash) picks chunk boundaries based on hash values being below a threshold , so a single byte change only affects the chunks containing the change, leaving the rest identifiable as duplicates.

vs Rabin-Karp / polynomial rolling hashes

Rabin-Karp uses polynomial evaluation over a finite field; BuzHash uses XOR over rotated table values. Both have O(1) rolling cost. BuzHash is slightly faster (no multiplications) and slightly weaker on certain pathological inputs. Most modern chunkers pick one or the other based on team familiarity rather than measurable difference.

References

Quick quiz

Test yourself on buzhash

10 multiple-choice questions. Pick an answer for each, then submit to see explanations.

  1. Q1.Who designed BuzHash?

  2. Q2.What kind of hash is BuzHash?

  3. Q3.What's the per-byte cost of BuzHash's update?

  4. Q4.What backup tool uses BuzHash for content-defined chunking?

  5. Q5.BuzHash's per-byte operation:

  6. Q6.Why does content-defined chunking matter for dedup?

  7. Q7.BuzHash vs Rabin fingerprinting:

  8. Q8.Is BuzHash cryptographic?

  9. Q9.BuzHash output size typically:

  10. Q10.Wikipedia classifies BuzHash as which family?

0 of 10 answered