Close
0%
0%

PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Similar projects worth following
A purely additive, non-cryptographic, fast and minimalistic checksum/PRNG/scrambler.

The sweet spot is a 32-bit checksum with a pair of 16-bit registers,
- very small, using only ADD,
- without the flaws of Adler32 and Fletcher, yet the same size and complexity,
- as good as CRC32, which it may replace when the memory footprint is critical, and can't be 0-crashed
- very easy to implement in digital circuits as well as microcontrollers,
- suitable for the detection of trivial corruption in network packets or files.
- not patented
- "as is", inappropriate for cryptography (though could inspire someone)

If you wonder why, just click and read the log.

For the what, check Definitions as well as PEAC vs LFSR for a comparison.

The original algo/project name was "Fibonacci checksum" but it later appeared that it was not the most accurate choice because Leonardo Pisano (the real name of Fibonacci) is the name associated to the periodicity of the sequence under modulo. All I did was add the tiny detail of "end-around carry", or "1-complement addition", which changed everything.

 
-o-O-0-O-o-
 

As for how I came to this system...

In 2012, Piano Magic released their album "Life Has Not Finished with Me Yet". One song contains a weird repeating pattern...

Glen Johnson's lyrics are often cryptic and more evocative than objective, but any geek's mind would cling on this mantra at the end:

"Add X to Y and Y to X"

This is really weird but... Why ? What's the point in this obscure "song" with no precise theme or clear subject ? And what does it do ? This last question is the most easily answered : just follow the damned algorithm.

C'est parti...

X=1, Y=0
Y+=X => 0+1=1
X+=Y => 1+1=2
Y+=X => 1+2=3
X+=Y => 2+3=5
Y+=X => 3+5=8
X+=Y => 5+8=13
X+=Y => 8+13=21
Y+=X => 13+21=34
X+=Y => 21+34=55
...

No need to go further, most of you should have recognised http://oeis.org/A000045, the famous Fibonacci sequence.

This gave me a compelling idea to modify the old Fletcher & Adler algorithms, keeping their very low footprint and minimalistic computational complexity. Both of these well known algos use a pair of values and have a similar structure. The trick is that rearranging the data dependency graph provides the equivalent of a minimalistic polynomial checksum, because the result is fed back on itself, in a more robust way than Fletcher's algo.

At first glance, this new checksum loop's body becomes something like :

Y += ( X ^ datum[i  ] );
X += ( Y ^ datum[i+1] );
i+=2;

This loop body is totally trivial to unroll. As trivial is the design of the corresponding digital circuit. This early version seemed to contain the whole checksum entropy in the last computed value but now X and Y are part of the signature. And the really important enhancement is the use of the carry!

For superscalar 32 bits CPUs, the following code seems to work well though the missing carry hardware (and/or lack of language support) requires more instructions to emulate:

t = X + Y + C;    Y = X + data;
C = t >> 16;      X = t & 0xFFFF;

A more efficient variation exists which does not use a temporary variable:

C += X + Y;
Y = X + data;
X = C & MASK;
C = C >> WIDTH;

In this worst case, without support of a carry flag, that's 5 basic operations (not counting memory loads) that fit in 4 registers and 3 cycles, to process 2 bytes. Not too bad. I'll let you deal with alignment. But is it really safe or safer?

The following logs will show how the idea evolves and the performance increases, through discussions about carry wrap-around, register widths, scheduling, avalanche, parallelism, orbits, structure, and escaping black holes...

 
-o-O-0-O-o-
 

Logs:
1. The variable
2. Datapath
3. Adler32 weakness
4. Easy circuit
5. Periodicity
6. A promising 32-bit checksum
7. Progress
8. Why
9. Orbital mechanics
10. Orbits, trajectories and that damned carry bit
11. Two hairy orbits
12. How to deal with black holes
13. Anteriority
14. Moonlighting as a PRNG
15. A new name
16. More orbits !
17. First image
18. Structure and extrapolations
19. Even more orbits !
20. Start and stop
21. Some theory
22. Some more theory
23. Even more theory.
24. A little enhancement
25. sdrawkcab gnioG
26. Further theorising
27. What is it ?
28. A bigger enhancement
29. Going both ways
30. Can you add states ?
31. Please, GCC !
32. _|_||_||_|______|______|____|_|___|_________|||_|__
33. Carry on with GCC
34. A good compromise
35. The new non-masking algorithm
36. Add or XOR ?
37. A 32/64-bit version
38. Closing the trajectories
39. Stats about missed crossings
40. Application : Displace...

Read more »

gPEAC.2-131540.txt.gz

List of maximal and perfect orbits, from 2 to 131540.

gzip - 101.30 kB - 04/05/2022 at 02:03

Download

moduli_02.c

like moduli_01.c but can handle larger data, for m>65K

x-csrc - 1.17 kB - 01/29/2022 at 21:36

Download

mods.log

1880 first results with maximal or double-maximal orbit lengths

x-log - 40.00 kB - 01/29/2022 at 20:36

Download

moduli_01.c

check orbit length for non-power-of-two moduli

x-csrc - 1.14 kB - 01/29/2022 at 20:36

Download

fusion_20220109.tgz

contains dump7ck, pscan_10 and fusion, good enough to test up to w26.

x-compressed-tar - 7.13 kB - 01/09/2022 at 13:34

Download

View all 55 files

  • Line scrambling with PEAC

    Yann Guidon / YGDES04/24/2022 at 11:51 0 comments

    One significant application for binary PEAC is for scrambling data over a serial link. LFSRs are the traditional candidates because, well, what else ? But their shortcomings are quickly prominent.

    State of the art with LFSR

    I'll let you browse Wikipedia for the references and definitions but here are the major purposes of a line scrambler :

    • Avoid long runs of 0s or 1s to ensure clock recovery
    • Spread the spectrum of the bitstream to ensure EMC compliance
    • Ensure "DC balance" (equal average number of 0s and 1s) to prevent saturation of magnetics coupling
    • Eventually allow the insertion of signaling symbols (such as in the 8B10B) for line management or flow control
    • Eventually transform binary symbols to other bases (ternary, quinary) for PAM3, PAM4, PAM5 physical encoding...
    • Eventually add a tiny bit of redundancy to provide crude FEC

    LSFR-based systems are used almost everywhere and are well characterised. Their ability to "crash" however is not as well covered but an example was given by Wikipedia where an old transmission protocol used a 7-tap LFSR that was trivial to attack: just transmit all possible sequences of the LFSR output, so that one (or more) packets would be all (or mostly) 0 (or 1) and the receiver can't recover the clock, leading to a loss of synchronisation and a failure of the link.

      The run length statistics may get worse if the data consists of specifically chosen patterns, instead of being random. An earlier scrambler used in Packet over SONET/SDH (RFC 1619 (1994)) had a short polynomial with only 7 bits of internal state which allowed a malicious attacker to create a Denial-of-service attack by transmitting patterns in all 27−1 states, one of which was guaranteed to de-synchronize the clock recovery circuits. This vulnerability was kept secret until the scrambler length was increased to 43 bits (RFC 2615 (1999)) making it impossible for a malicious attacker to jam the system with a short sequence. (64b/66b_encoding#Run_length)

    Modern links use 64B66B (that is : raw copy of the data and add a couple of signaling bits) followed by a 58-tap LFSR. The "clock crash" attack becomes practically impossible but then you have to deal with the possibility of sequences of 57 identical bits. From what I have gathered, another smaller LFSR or encoder is used again to further ensure DC balance. At this point the designers bet on the statistical properties of the data and channel to "push" the probability of an "adverse condition" to the "extremely unlikely" domain but it's still not impossible. Apparently, low encoding overhead seems to be more important to push the bit-per-symbol ratio.

      "The scrambler cannot guarantee that output data will never have a long run-length of 0s or all 1s, or other undesirable properties in communications, but does allow strong statistical bounds to be put on the probability of such events. Practical designs will choose system parameters such that a bit-error due to long run-lengths is vanishingly unlikely." (64b/66b encoding)

    Of course, LFSRs must run at the bit/line speed so a 10Gbps link requires a 10GHz LFSR. Same if you want to reach 40 Gbps and there, the available technology requires a parallel LFSR implementation, meaning that the poly is extremely sparse to allow operation at a low-enough ratio. With the IEEE proposal x58+x19+1, there are 19 layers of XOR to implement before the circuits becomes too complex, but basically this limits the slowdown to 1/19. For a 1GHz logic speed, the bit speed is at most 20Gbps, and the scrambler size (and power draw) is directly linear to the slowdown.

    One thing I was surprised to not find is a detection that the LFSR got stuck or "0-crashed", in order to restart it automatically. It would depend if the scrambler is additive or multiplicative but apparently, handwaving and reducing the risks down to statistics is the preferred engineering method.

    Links:

    https://en.wikipedia.org/wiki/Linear-feedback_shift_register#Scrambling

    https://en.wikipedia.org/wiki/Scrambler#Additive_(synchronous)_scramblers...

    Read more »

  • Just another hypothesis.

    Yann Guidon / YGDES04/12/2022 at 22:15 0 comments

    In a previous log (More questions on gPEAC) I evoked a hunch I had but nothing was explained. I write this log to close this gap.

    I suspect a mechanism that is somewhat similar to what I have seen with prime numbers, since I have studied algorithms (reverse sieves) to generate them. The common hints are

    • a density that follows a logarithmic probability (to be determined for PEAC)
    • Something to do with numbers moduli a prime number (2, 5 are identified)
    • there are exclusion rules that have one exception, which is the very first occurrence.

    From there it is possible to imagine a similar algorithm to detect which moduli are not going to give a perfect orbit.

    I should try the sieve with other prime numbers, as already suggested in another log.

  • Definitions

    Yann Guidon / YGDES04/12/2022 at 17:23 0 comments

    See also log 81. PEAC vs LFSR.

    _____________________________________

    General definition

    As the name says, PEAC is a Pisano sequence with the added twist of "End-Around Carry", where the sum is post-incremented when an overflow (or wrap around) occurred.

    The sequence is performed by simple iterations using 2 variables and only one parameter: the modulus (written m). Depending on this modulus, there can be one, two or more cyclic sequences, or "orbits":

    • When there is one orbit, which is the ideal case, the system is called "perfect" and its period is m² + m - 2.
    • When there are two orbits, which is still "good enough" for most cases, the system is called "maximal" (for historical reasons) and each of the symmetrical orbits has a period of ((m² + m) / 2 ) -1.
    • The other cases have not been named.

    There are a total of m² + m states but we don't usually count the degenerate states ("stuck points") at (0, 0) and (m-1, m-1). These values are avoided in PRNG mode but the scrambler and the checksum configurations are immune.

    The +m accounts for 2 cases where X=Y, so if only X or Y is looked at, its sequence of values has all the possible values from 0 to m-1, with a 1/2 bias for those limit values.

    Note: just like a LFSR, it is a purely linear iterative function that is reversible and can be unwound with a strategy similar to the Berlekamp-Massey algorithm.

    Binary PEAC

    For practical reasons, the first PEAC sub-family that has been studied has a width parameter w such that m = 2w.

    • widths w=3, 4, 10 and 16 are maximal, they have 2 complementary orbits.
    • widths 2 and 26 are perfect (26 still needs a computational proof though, but it is not a priority)

    This binary PEAC has a very straight-forward and efficient implementation in hardware and software, making it a sweet spot for PRNG, checsum and (de)scrambler. However it has poor mixing.

    Generalised PEAC

    Any other modulus is possible, and gPEAC are all the other moduli that are not a power of two. The operations are slightly less convenient for HW implementation but the high-level code is still quite simple and this uncovers some of the mathematical elements required for its analysis.

    With a higher "bit density" (just like with LFSR with "dense polynomials"), they have better mixing properties and might become an element for hashing functions.

    Here are 2 sub-families where we can tune the biases for a specific application (here: binary words). Other applications and constraints may lead to the creation of other sub-families. Note that the modulus should still be carefully chosen and the "spectrum" of valid values is sparse.

    Sqrt(2) moduli

    Sqrt(2) is an irrational value with a "good enough" bit density that can be easily scaled, making it appropriate for basic hashes. The other interesting property is that when the modulus is squared, the product is a power of two, suitable for a binary representation with the least bias. This makes it appropriate for indexing hash tables with dynamic size, where the MSB are masked off (or discarded).

    This type of moduli is under investigation at this time.

    Adjusted moduli

    A "perfect" modulus m has a period of m² + m - 2.  The extra m-2 can be a nuisance so a modulus of the form 2w- 2w-1 could absorb most of the bias and boost the mixing. This is only a speculation for now.

  • More questions on gPEAC

    Yann Guidon / YGDES04/05/2022 at 02:27 0 comments

    I just uploaded the file gPEAC.2-131540.txt.gz which contains the results of a long scanning campaign in early February.

    From 2 to 131540, there are 13646 maximal and/or perfect sizes, among them 3858 are perfect.

    I should plot the density vs value, and the gaps, or histograms thereof...

    One very interesting and unexpected find (already noted at wow.) is that :

    • Maximal (semi-perfect) systems have a modulus ending with the decimal digits 0, 1, 3, 4, 5, 6, or 9.
    • Perfect systems have a modulus ending with the decimal digits 0, 4, 6, or 8.

    Weren't we looking for some sort of pattern ? Well, there could be one and I included it in the latest scanner's code: scanning stops if the last digit is either 2 or 7 (with the exceptions of size=2).

    There is a blind spot however: the scan only cared for 2 cases but we would need the whole data, including the length and count of all the orbits, in order to "predict" if a given modulus is promising (at least we can eliminate some candidates.

    Let's review the actual moduli of the binary PEAC

     1 2 perfect (a noted outlier and exception)
     2 4 maximal
     3 8 maximal
     4 16 maximal
     5 32
     6 64
     7 128
     8 256
     9 512 (impossible)
    10 1024 maximal
    11 2049
    12 4096
    13 8192 (impossible)
    14 16384
    15 32768
    16 65536 maximal
    17 131072 (impossible)
    18 262144
    19 524288
    20 1048576
    21 2097152 (impossible)
    22 4194308
    23 8388608
    24 16777216
    25 33554432 (impossible)
    26 67108864 perfect ?
    

    From there we can know which powers of 2 to skip:

    27 134217728
    28 268435456
    29 536870912 (impossible)
    30 1073741824
    31 2147483648
    32 4294967296
    33 8589934592 (impossible)
    34 17179869184
    etc.
    

    Better : this can help with finding better candidates for the hash moduli.

    sqrt(2^33) = 92681
    sqrt(2^35) = 185363
    sqrt(2^37) = 370727 (impossible)
    sqrt(2^39) = 741455
    sqrt(2^41) = 1482910 maybe perfect
    sqrt(2^43) = 2965820 maybe perfect
    sqrt(2^45) = 5931641
    sqrt(2^47) = 11863283
    sqrt(2^49) = 23726566 maybe perfect
    sqrt(2^51) = 47453132 (impossible)
    sqrt(2^53) = 94906265
    sqrt(2^55) = 189812531
    sqrt(2^57) = 379625062 (impossible)
    sqrt(2^59) = 759250124 maybe perfect
    sqrt(2^61) = 1518500249
    sqrt(2^63) = 3037000499
    sqrt(2^65) = 6074000999
    sqrt(2^67) = 12148001999
    sqrt(2^69) = 24296003999
    sqrt(2^71) = 48592007999
    sqrt(2^73) = 97184015999
    sqrt(2^75) = 194368031998 maybe perfect
    sqrt(2^77) = 388736063996 maybe perfect
    sqrt(2^79) = 777472127993
    sqrt(2^81) = 1554944255987 (impossible)
    etc.
    

    But this is for now purely observational and 2^3=8 is an exception to the semi-perfect rule. I'm sure I miss something.

    So not only should I recompute these numbers completely, but also plot them to BMP and count the number of orbits, like I did for the videos.

    I should also examine the existing list with other bases (not just base 10, but prime bases, so maybe there is another link with Galois Fields ?) and understand why 2 and 7 are exceptions (note: they are 5 apart, mod 10)

    This is really so strange... I suspect some prime-number tricky business here.

  • Funky modulo

    Yann Guidon / YGDES03/23/2022 at 19:06 0 comments

    I have not had time or energy to post here for a month now but I HAD to share something quite amusing, borderline useful but curious enough that I wonder if somebody had already come up with this sort of HAKMEM.

    So far, you should already know that you can compute a number modulo a power of two by masking the MSB off.

    For example if you want to perform mod 128,

    A = B & 127;

    This is because 128=2^7 and the mask is the modulo minus one.

    This is why we computer people LOVE powers of two : it's quick and painless.

    BUT this is not always the case and the generalised PEAC algo made me scratch my head a bit, until I found something amusing.

    The typical method for modulo operation is the modulo operator (%) but it's is known to be "expensive".

    There is also the method that performs a division by multiplying by the reciprocal, but then you have to perform adjustments, and then a sbutraction.

    If you *know* that the original value is less than 2× the modulus, then you can compare, conditional jump and subtract.

    if (A >= mod)
       A -= mod;

    This is the preferred version these days, with the OOO cores, but there are several alternatives, for example if you explicitly precompute the adjusted value. This is where it gets funky.

    For example, you can precompute B:

    B = A - mod;
    if (A >= mod)
       A = B;

    But then you think, oh well, if I compare A with mod, it's equivalent to a subtraction, which I have already performed on the previous line, so why the redundancy ? But then the language gets in the way.

    B = A - mod;
    if (B >= 0)
       A = B;

    That's usually another comparison with 0... Which should be avoided since the previous subtraction might provide a few flags about the result. So the if-then is simply a conditional move, CMOV, as can be found on x86 or ARM.

    But not all CPUs have CMOV. And then you realise that B can have crazy values, depending on the type: signed or unsigned ?

    When A < mod, then B=A-mod < 0 and when using 2s complement with unsigned format, the value jumps through the roof. So the following line should work:

    B = A - mod;
    A = umin(A,B)
    • if B becomes negative, then its unsigned value goes above A, which is chosen.
    • If B is positive, thus a lower value than A (because it is A-mod) then B is chosen.

    Voilà ! You have your funky mod. But there is a little catch: it doesn't work for the gPEAC because the assigned value is off-by-one.

    Sigh

  • Hashing with generalised PEAC (ep.5)

    Yann Guidon / YGDES02/17/2022 at 09:52 0 comments

    Note: this is a work in progress and some assertions can still be inaccurate, just stay tuned...

    .

    As I alluded at the end of the log I just realised...the new "formula" makes the algorithm's behaviour very easy, much more convenient to analyse.

    The core of the computation (in the simple PRNG configuration) is as follows:

    if (X >= modulo)
        X -= modulo-1;
    T = X;  // no unrolling here so a temp reg is required
    X += Y;
    Y = T;
    

    The order of the operations is critical, as it affects the range of the variables.

    For example in the above code, X has a range of 0 to (2×modulo)-3 and Y goes from 0 to modulo-1. In this case, the values of X that exceed modulo correspond to the values where C=1 in the binary PEAC algorithm, and this state must be preserved across iterations. This is why the comparison and "incremented modulo" operation occur first, before the accumulation.

    Already we can see how the "stuck points" appear.

    The null point X=0,Y=0 is obviously going nowhere.

    The opposite of the null point X=mod-1,Y=mod-1 will also give X=mod-1,Y=mod-1 because the same value is both added (by Y) and subtracted (by the modulo). This confirms that X should not reach 2×(mod-1) or it would get stuck. (I think there is a discrepancy here)

    If we're hashing data (D) then the algorithm is modified a bit:

    if (X >= modulo)
        X -= modulo-1;
    if (Y >= modulo)
        Y -= modulo-1;
    T = X;
    X += Y;
    Y = T+D;
    

    The first half ensures that X and Y are in the range 0 to modulo-1.

    The last two lines ensure that the new data is incorporated in the right order, in parallel, to prevent overflow.

    Both halves could be swapped but I doubt this would be perfectly equivalent. Anyway when the mod is a power of two, the behaviour should be identical. It might even run slightly faster on modern OOO CPUs.

    (tbc)

  • Hashing with generalised PEAC (ep.4)

    Yann Guidon / YGDES02/17/2022 at 06:28 0 comments

    Bon Jenkin's work shows that a good has can be more efficient with 2 passes:

    1. Scan the data and collect entropy, while mixing it decently
    2. Postpone the real avalanche to a final() procedure that could be more elaborate and thorough.

    The problem with Fibonacci-based algorithm is that the last scanned element has the least avalanche so the whole state should be tumbled and mixed and avalanched more seriously. This saves a lot of CPU cycles during the scan itself.

    In http://www.burtleburtle.net/bob/c/lookup3.c, Bob uses a series of xors, subtractions and rotations. We don't have such a luxury with only 2 variables, however we can promote avalanche by introducing constant XORs as they will push the avalanche further toward the MSB.

    For example 0x77777 (01110111011101110111) will "pop" up 0x00001 to 0x77778.

    Then another constant with more bits and shifted relative to the previous one will pop up the data further toward the MSB, such as 0xDFDFD (11011111110111111101)

    Then 0xFDFDF will propagate the LSB even further...

    Unlike Bob Jenkin's "magic constants" that were derived from trial & errors, through heavy computational efforts, these constants can be "engineered", designed.

    For a 16+-bit hash I suppose that at least 6 "finishing" rounds of mixing are required to avalanche nicely. The beauty of this is that these "avalanching patterns" also could be appended to the end of the scanned stream, as well, with the same effect, so no specific code is required during scanning. There would be 2 extra words (constant + size) as a prefix, then 6 constants at the suffix...

    C0 Size D0 D1 D2 D3.... C1 C2 C3 C4 C5 C6

    That looks like a good plan.

  • Hashing with generalised PEAC (ep.3)

    Yann Guidon / YGDES02/17/2022 at 05:59 0 comments

    It seems that PEAC is a gift that keeps on giving and gPEAC has only started.

    gPEAC solves the problem of poor mixing of binary PEAC while introducing some more complexity in other places and we're about to talk about that. For example, the resulting signature doesn't perfectly fit into a binary computer world (which would introduce some slight bias) but the mixing sure is much better.

    Now let's look again at the core of the computation:

    if (Y >= modulo)
        Y -= modulo-1;
    Y += X;
    

    We all know that subtraction is equivalent to addition with an opposite addend, and in 2s-complement -X = (~X)+1 so the whole can be rewritten as:

    if (Y >=  modulo)
        Y += ~modulo + 2;
    Y += X;
    

    (and the carry can be discarded)

    WAIT WHAT ?

    Let's take

       Y -=       modulo
    => Y  = Y + (~modulo) + 1

    Now let's take the other half and subtract -1 to the modulo:

        Y -= - 1
    ==> Y += -(-1)
    ==> Y += 1
    

    You know the drill, 1+1=2 (woah) and we get the previous code.

    For example, for the known case of w2, modulo=4 so we subtract 3, which is indeed ~4=11111011 (or -5 in 2s-complement) and -5+2=-3.

    But why would this even matter ?

    This is because we want to promote avalanche as much as possible. We get avalanche through the addition, where carries between bits propagate and mix information. We want reasonably long sequences of consecutive 1s and few 0s to separate them, for example 0x7777=0111011101110111. With this pattern, a low number takes maybe 4 cycles (as many 0s) to avalanche to the MSB.

    Subtraction adds the opposite value, which means that 0s turn to 1s and vice versa. This means that we should look for moduli that have sparse bits, such as 000010000100001 or such. This is a reversal from the previous belief that "dense moduli" should be preferred to promote avalanche. So I must re-evaluate my selection criteria... without forgetting to account for the +2 ;-)

    Thus we must evaluate the binary representation of modulo-1.

  • Hashing with generalised PEAC (ep.2)

    Yann Guidon / YGDES02/07/2022 at 07:55 0 comments

    As we have recently discussed, Non-power-of-two PEAC is a generalisation of the original binary PEAC system, and they both look increasingly promising to each other. So let's call the new generalised algo gPEAC :-)

    • PEAC is fast, with a very short critical datapath, and well suited for hardware implementation, but it is limited to only certain widths and its mixing is quite weak. It is suited for medium to large blocks, from tens to thousands of bytes.
    • gPEAC has better mixing and a much wider set of working parameters. It uses one fewer register and the code looks simpler. However its critical datapath has not one but 3 operations (add, compare and subtract run in parallel, and a final selection step). It should work well for small buffers, such as for hash tables for URI/URL/parslers...

    Mixing is not a critical criterion for a checksum/CRC and PEAC's speed reflects this. However hash tables require much stringent properties. I will not repeat Bon Jenkin's work, but only mention that one key's hash should be very different from another key that is very similar. This implies an "avalanche" effect, created by operations that promote mixing and tumbling of the individual bits of the numbers.

    Both PEAC and gPEAC must be initialised with a pair of values: one "arbitrary constant", and the size of the buffer to hash.

    Then the buffer is processed, as usual.

    At the end, a hash needs to increase mixing: this is borrowed from Bob's final() method (introduced in lookup3.c), where the state of the registers is further mixed and tumbled to increase avalanche and smooth the statistics for small variations of input values. This is a compromise that lets the previous scanning part be simpler, and relegate the thorough avalanching to the last moment. At this stage, which could be about 3 to 8 additional rounds of gPEAC, the modulus could even change for each round.

    The result of the hash with gPEAC is given by X+(Y×modulus) (multiplication is necessary at this stage because it is not a power of 2) and the range is 2×modulus² because the folding/modulo is performed before the addition and the result has one variable with double the range. I'll get the maths straight soon.

    The choice of the modulus depends on

    • the range of hash values that are required (which also depends on the collision probability, see the birthday paradox)
    • the size of the host's registers
    • the size of the hashed words (8-bit bytes ? 16 bits ?)

    At this early stage, we can only estimate the range and properties of the modulus and infer its application range.

    • The modulus must be less than one half the range of the host registers. That's 2³¹ for 32-bit systems, and only 32767 for 16-bit computers. Otherwise the Fibonacci accumulation overflows. This is however a nice feature of gPEAC because it does not explicit the need for a carry value. This simplifies handling in the largest range of computers and languages possible.
    • The square of the modulus must be as close to a power of two as possible. This is required because taking individual bits, or groups of bits, should have a probability as close to 50% as possible. Hash values are often truncated and should not have a noticeable bias.
    • The modulus should be just a bit larger than the size of the hashed words. This is to increase avalanche during the scanning itself and reduce the number of rounds of final mixing. So if 16-bit words are mixed, the ideal modulus is about √2×2¹⁶ = 92681 though this wastes a huge range provided by a 32-bit register.
    • However, with a thorough final mixing, the scanning part could have a larger modulus to increase the entropy (before it is truncated).
    • Knowing φ=1.6 we can estimate how many rounds of final mixing are required to produce at least one wrap-around in the worst case. The rule of thumb is to give an extra 50% to the log2 of the expected value. Example: to reach approx. 16M from the value 1, it takes 24 multiplies by 2, but the value 14930352...
    Read more »

  • I just realised...

    Yann Guidon / YGDES02/06/2022 at 11:08 0 comments

    Two recent logs Hashing with npo2 PEAC and wow. explore a generalised, non-power-of-two version of PEAC and the results are quite interesting, to say the least. But it's not done yet.

    I made a quick-and-dirty scanner that contains this inner loop:

      do {
        C += X + Y;
        if (C >= modulo) { // This branch is going to totally mess with the branch predictor
          Y = C - modulo;
          C = 1;
        }
        else {
          Y = C;
          C = 0;
        }
        if ((Y == 1) && (X==0))
          return len+1;
    
        len+=2;
    
        C += X + Y;
        if (C >= modulo) {
          X = C - modulo;
          C = 1;
        }
        else {
          X = C;
          C = 0;
        }
      } while ((X != 1) || (Y != 0));
      return len;
    

    The if / else part is pretty cumbersome but is a direct application of the original code.

    But it is still "not right".

    It took me too long to realise my mistake and it can be simplified quite significantly:

        Y += X;
        if (Y >= modulo)
            Y -= modulo-1;
    

    And that's all !

    This is also where things become even more interesting: PEAC does NOT perform a normal modulo. In the original code, it was a normal modulo but setting C to 1 is like decrementing the modulus. The new code integrates this directly in the subtraction and thus drops one temporary value, C, but the algorithm is not strictly equivalent anymore: PEAC has a separate flag and tests should be performed at a different stage, since the +1 is applied at the same time as the modulo operation.

    One solution is to consider C merged with X (or Y) between steps of the algorithm. The modulo is then performed before the accumulation.

    // odd step
    if (X >= modulo)
        X -= modulo-1;
    Y += X;
    // even step
    if (Y >= modulo)
        Y -= modulo-1;
    X += Y;
    

    Could it be even simpler? I doubt.

    This changes the criterion for the selection of the modulus, since what matters now is not the comparison/trigger, but the subtrahend. This is the value that should be a "dense prime" to promote/maximise mixing between the bits. This totally changes how I view the values that were scanned.

    The closest maximal modulus to the ideal 2^(16.5)=92681 is 92688, which is even but the subtrahend is 92687=7×13241=0b10110101000001111 (not bad but could be better).

    Below that, the closest even number is 92668, giving the subtrahend 92667=3×17×23×79=0b10110100111111011 which is better but would introduce more bias in the hashed values.

    ______________________________________________

    On top of that, this new, purified version of the algorithm also makes its analysis way easier. The behaviour is much more explicit so hopefully, I'll progress on the theory side, including for the original PEAC... For example it's possible to get a simpler proof of the existence and conditions to reach the sticky states.

    Stay tuned...

View all 126 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 04/08/2022 at 11:41 point

https://en.wikipedia.org/wiki/Cyclic_code

It seems that PEAC is a cyclic code (in a loose way because it's not using Galois fields but still interesting)

  Are you sure? yes | no

Ezra Wolf wrote 01/21/2022 at 23:18 point

Wow !

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/26/2022 at 05:27 point

Hi Ezra, feel free to elaborate :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/17/2022 at 08:49 point

Correction to some of my statements here and there (and my videos) : φ is not transcendental but an irrational number that is a solution to the quadratic equation x² − x − 1 = 0. Mea culpa, but at least I learned something from one of the Bogdanoff brothers' countless errors and confusions :-D

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/14/2022 at 16:55 point

Oh my !

https://en.wikichip.org/wiki/x86/adx

Now, THAT's a hack :-)

And I can see how it could increase the speed of some algorithms, even this one...

  Are you sure? yes | no

Tim wrote 01/14/2022 at 19:44 point

On ever other architecture, yes. On x86 is does not add notably to the entropy :)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/15/2022 at 12:04 point

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/23/2021 at 21:33 point

Interesting !

https://research.nccgroup.com/2021/10/15/cracking-random-number-generators-using-machine-learning-part-1-xorshift128/

https://research.nccgroup.com/2021/10/18/cracking-random-number-generators-using-machine-learning-part-2-mersenne-twister/

  Are you sure? yes | no

[deleted]

[this comment has been deleted]

Yann Guidon / YGDES wrote 10/13/2021 at 14:39 point

so far, half a year and 50£.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/14/2021 at 04:16 point

George Marsaglia and Arif Zaman got quite a lot figured out in 1991:
https://projecteuclid.org/download/pdf_1/euclid.aoap/1177005878

Why didn't this catch up, but they focus on extremely large sequences with tens of registers of lag, and didn't look enough at just a pair of registers. They also didn't look at the link with checksums.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/08/2021 at 07:28 point

Funny !

https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

Toward a universal random number generator, G.Marsaglia, A.Zaman

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:24 point

https://www.researchgate.net/profile/Michael_Greenwald/publication/3334567_Performance_of_checksums_and_CRCs_over_real_data/links/53f4b15f0cf2888a749113c5/Performance-of-checksums-and-CRCs-over-real-data.pdf

"In practice, TCP trailer sums outperform even Fletcher header sums."

Ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/08/2021 at 00:51 point

"Ones complement (with EAC) Fletcher, however, has a weakness that sometimes offsets its probabilistic advantage: since bytes containing either 255 or 0 are considered identical by the checksum, certain common pathological cases cause total failure of Fletcher-255."

Re-ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:09 point

http://users-tima.imag.fr/cdsi/guyot/Cours/Oparithm/english/Modulo.htm and others say that special adders exist to compute mod 2^n-1.

But this is not necessary because the carry is explicitly integrated in the next cycle.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 22:20 point

https://www.cs.auckland.ac.nz/compsci314s1t/resources/Checksums.pdf  is a nice beginner-level overview of error detection basics. Polynomial checks are mentioned, as well as 1s complement / end-around-carry and Fletcher is well explained but no sight of Fibonacci.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/24/2021 at 03:52 point

Oh, nice !

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates