Non-cryptographic
BuzHash
Designed by Robert Uzgalis (1992). A cyclic XOR rolling hash that slides over a window of bytes with constant per-byte cost. Used anywhere “hash of the last N bytes” needs to update in O(1) per byte , deduplicating backup tools, content-defined chunking, and rsync-style protocols.
The construction
Pick a random permutation table T[256] mapping bytes to w-bit values. Maintain a sliding hash h over a window of n bytes:
add byte b: h = rotl(h, 1) XOR T[b]
remove byte b: h = h XOR rotl(T[b], n)Both add and remove are O(1) per byte. Total cost over an entire file is O(file_size), regardless of window size.
At a glance
| Output | 32 or 64 bits (variant choice) |
|---|---|
| Window size | Configurable (typically 32-128 bytes) |
| Designer | Robert Uzgalis (1992) |
| Throughput | 5+ GiB/s on byte loops |
| Status | Non-cryptographic; rolling hash for content-defined chunking |
Where it shows up
- Borg backup , BuzHash drives content-defined chunking for deduplication.
- restic , same chunking idea, BuzHash inspired.
- Bup, Attic , rolling hash for variable-size chunks.
- rsync-class delta-transfer protocols.
- Plagiarism detection , rolling hash of source-code n-grams.
Why rolling matters for dedup
Fixed-size chunking is brittle: a 1-byte insertion shifts every subsequent chunk boundary, defeating dedup. Content-defined chunking (Rabin fingerprints, BuzHash) picks chunk boundaries based on hash values being below a threshold , so a single byte change only affects the chunks containing the change, leaving the rest identifiable as duplicates.
vs Rabin-Karp / polynomial rolling hashes
Rabin-Karp uses polynomial evaluation over a finite field; BuzHash uses XOR over rotated table values. Both have O(1) rolling cost. BuzHash is slightly faster (no multiplications) and slightly weaker on certain pathological inputs. Most modern chunkers pick one or the other based on team familiarity rather than measurable difference.
References
- Robert Uzgalis, “Hashing concepts and the Java programming language” (1992 manuscript, BuzHash introduction).
- Wikipedia , Rolling hash (BuzHash described as the “cyclic polynomial” family)
- Borg backup , internals (BuzHash-based chunker)
- Jenkins hash
Quick quiz
Test yourself on buzhash
10 multiple-choice questions. Pick an answer for each, then submit to see explanations.
Q1.Who designed BuzHash?
Q2.What kind of hash is BuzHash?
Q3.What's the per-byte cost of BuzHash's update?
Q4.What backup tool uses BuzHash for content-defined chunking?
Q5.BuzHash's per-byte operation:
Q6.Why does content-defined chunking matter for dedup?
Q7.BuzHash vs Rabin fingerprinting:
Q8.Is BuzHash cryptographic?
Q9.BuzHash output size typically:
Q10.Wikipedia classifies BuzHash as which family?