Close

Hashing with generalised PEAC (ep.2)

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 02/07/2022 at 07:550 Comments

As we have recently discussed, Non-power-of-two PEAC is a generalisation of the original binary PEAC system, and they both look increasingly promising to each other. So let's call the new generalised algo gPEAC :-)

Mixing is not a critical criterion for a checksum/CRC and PEAC's speed reflects this. However hash tables require much stringent properties. I will not repeat Bon Jenkin's work, but only mention that one key's hash should be very different from another key that is very similar. This implies an "avalanche" effect, created by operations that promote mixing and tumbling of the individual bits of the numbers.

Both PEAC and gPEAC must be initialised with a pair of values: one "arbitrary constant", and the size of the buffer to hash.

Then the buffer is processed, as usual.

At the end, a hash needs to increase mixing: this is borrowed from Bob's final() method (introduced in lookup3.c), where the state of the registers is further mixed and tumbled to increase avalanche and smooth the statistics for small variations of input values. This is a compromise that lets the previous scanning part be simpler, and relegate the thorough avalanching to the last moment. At this stage, which could be about 3 to 8 additional rounds of gPEAC, the modulus could even change for each round.

The result of the hash with gPEAC is given by X+(Y×modulus) (multiplication is necessary at this stage because it is not a power of 2) and the range is 2×modulus² because the folding/modulo is performed before the addition and the result has one variable with double the range. I'll get the maths straight soon.

The choice of the modulus depends on

At this early stage, we can only estimate the range and properties of the modulus and infer its application range.

The goal is not to compete with existing high-performance hashes that already exist, but to find a sweeter spot for small, embedded systems, with a few registers holding 16 or 32 bits. The application would be for small lookups, such as variable names for interpreters, or URI in embedded servers, not for large databases. The existing and known "simple hashes" have a lot of flaws and drawbacks...

At this moment, my computer is still crunching the numbers and seeking interesting moduli, and this is a very slow process when we don't know what to look for. The computation has reached 128829 for now, almost a power of 2. Finding much larger moduli is computationally hard with a trivial method, just as for the binary PEAC, so I don't know yet how to find better higher values.

Discussions