Close

Hashing with generalised PEAC (ep.4)

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 02/17/2022 at 06:280 Comments

Bon Jenkin's work shows that a good has can be more efficient with 2 passes:

  1. Scan the data and collect entropy, while mixing it decently
  2. Postpone the real avalanche to a final() procedure that could be more elaborate and thorough.

The problem with Fibonacci-based algorithm is that the last scanned element has the least avalanche so the whole state should be tumbled and mixed and avalanched more seriously. This saves a lot of CPU cycles during the scan itself.

In http://www.burtleburtle.net/bob/c/lookup3.c, Bob uses a series of xors, subtractions and rotations. We don't have such a luxury with only 2 variables, however we can promote avalanche by introducing constant XORs as they will push the avalanche further toward the MSB.

For example 0x77777 (01110111011101110111) will "pop" up 0x00001 to 0x77778.

Then another constant with more bits and shifted relative to the previous one will pop up the data further toward the MSB, such as 0xDFDFD (11011111110111111101)

Then 0xFDFDF will propagate the LSB even further...

Unlike Bob Jenkin's "magic constants" that were derived from trial & errors, through heavy computational efforts, these constants can be "engineered", designed.

For a 16+-bit hash I suppose that at least 6 "finishing" rounds of mixing are required to avalanche nicely. The beauty of this is that these "avalanching patterns" also could be appended to the end of the scanned stream, as well, with the same effect, so no specific code is required during scanning. There would be 2 extra words (constant + size) as a prefix, then 6 constants at the suffix...

C0 Size D0 D1 D2 D3.... C1 C2 C3 C4 C5 C6

That looks like a good plan.

Discussions