Close
0%
0%

PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Similar projects worth following
Another purely additive, non-cryptographic, fast and minimalistic checksum.

Computes a 32-bit checksum with a pair of 16-bit registers,
- very small, using only ADD,
- without the flaws of Adler32 and Fletcher, yet the same size and complexity,
- almost as good as CRC32, which it may replace when the memory footprint is critical,
- very easy to implement in digital circuits as well as microcontrollers,
- suitable for the detection of trivial corruption in network packets or files.
- not patented
- inappropriate for cryptography or hash tables.

2×32 bits and larger are possible with the non-masking algorithm using w26.

If you wonder why, just click and read the log.

The original algo/project name was "Fibonacci checksum" but it later appeared that it was not the most accurate choice because Leonardo Pisano (the real name of Fibonacci) is the name associated to the periodicity of the sequence under modulo. All I did was add the tiny detail of "end-around carry", or "1-complement addition", which changed everything.

 
-o-O-0-O-o-
 

In 2012, Piano Magic released their album "Life Has Not Finished with Me Yet". One song contains a weird repeating pattern...

Glen Johnson's lyrics are often cryptic and more evocative than objective, but any geek's mind would cling on this mantra at the end:

"Add X to Y and Y to X"

This is really weird but... Why ? What's the point in this obscure "song" with no precise theme or clear subject ? And what does it do ? This last question is the most easily answered : just follow the damned algorithm.

C'est parti...

X=1, Y=0
Y+=X => 0+1=1
X+=Y => 1+1=2
Y+=X => 1+2=3
X+=Y => 2+3=5
Y+=X => 3+5=8
X+=Y => 5+8=13
X+=Y => 8+13=21
Y+=X => 13+21=34
X+=Y => 21+34=55
...

No need to go further, most of you should have recognised http://oeis.org/A000045, the famous Fibonacci sequence.

This gave me a compelling idea to modify the old Fletcher & Adler algorithms, keeping their very low footprint and minimalistic computational complexity. Both of these well known algos use a pair of values and have a similar structure. The trick is that rearranging the data dependency graph provides the equivalent of a minimalistic polynomial checksum, because the result is fed back on itself, in a more robust way than Fletcher's algo.

At first glance, this new checksum loop's body becomes something like :

Y += ( X ^ datum[i  ] );
X += ( Y ^ datum[i+1] );
i+=2;

This loop body is totally trivial to unroll. As trivial is the design of the corresponding digital circuit. This early version seemed to contain the whole checksum entropy in the last computed value but now X and Y are part of the signature. And the really important enhancement is the use of the carry!

For superscalar 32 bits CPUs, the following code seems to work well though the missing carry hardware (and/or lack of language support) requires more instructions to emulate:

t = X + Y + C;    Y = X + data;
C = t >> 16;      X = t & 0xFFFF;

A more efficient variation exists which does not use a temporary variable:

C += X + Y;
Y = X + data;
X = C & MASK;
C = C >> WIDTH;

In this worst case, without support of a carry flag, that's 5 basic operations (not counting memory loads) that fit in 4 registers and 3 cycles, to process 2 bytes. Not too bad. I'll let you deal with alignment. But is it really safe or safer?

The following logs will show how the idea evolves and the performance increases, through discussions about carry wrap-around, register widths, scheduling, avalanche, parallelism, orbits, structure, and escaping black holes...

 
-o-O-0-O-o-
 

Logs:
1. The variable
2. Datapath
3. Adler32 weakness
4. Easy circuit
5. Periodicity
6. A promising 32-bit checksum
7. Progress
8. Why
9. Orbital mechanics
10. Orbits, trajectories and that damned carry bit
11. Two hairy orbits
12. How to deal with black holes
13. Anteriority
14. Moonlighting as a PRNG
15. A new name
16. More orbits !
17. First image
18. Structure and extrapolations
19. Even more orbits !
20. Start and stop
21. Some theory
22. Some more theory
23. Even more theory.
24. A little enhancement
25. sdrawkcab gnioG
26. Further theorising
27. What is it ?
28. A bigger enhancement
29. Going both ways
30. Can you add states ?
31. Please, GCC !
32. _|_||_||_|______|______|____|_|___|_________|||_|__
33. Carry on with GCC
34. A good compromise
35. The new non-masking algorithm
36. Add or XOR ?
37. A 32/64-bit version
38. Closing the trajectories
39. Stats about missed crossings
40. Application : Displace LFSR in BIST
41. I need a big cluster.
42. Another theoretical rambling
43. An individual job and all the necessary processing
44....

Read more »

snippets_v3.c

basic PRNG, checksum, scrambler/descrambler (sans and avec carry) using PEACw16 in C

x-csrc - 4.39 kB - 10/10/2021 at 14:28

Download

sous-titres.txt

subtitles for the French version of the video about trthe PEAC generator

plain - 14.63 kB - 10/06/2021 at 09:30

Download

zoumout2.tgz

files to compute and render the 16 zoom-out videos

x-compressed-tar - 2.82 kB - 09/14/2021 at 04:19

Download

Marsiglia _ add with carry RNG 1177005878.pdf

Some great theory from 1991

Adobe Portable Document Format - 1.76 MB - 09/14/2021 at 04:17

Preview
Download

html_slides.tgz

files for log73.

x-compressed-tar - 1.27 kB - 09/06/2021 at 01:18

Download

View all 40 files

  • And the English version !

    Yann Guidon / YGDES2 days ago 0 comments

    Shiggi from https://www.audiofy.me/ did the narration for the English version of the video.

    Much more needs to be explained but it's a good first shot, right ?

  • Taxonomy

    Yann Guidon / YGDES10/08/2021 at 10:06 0 comments

    The PEAC started as a checksum algorithm and soon the corresponding generator had to be studied. As noted in one of the previous logs, there are also counterparts: the scrambler and its inverse, the descrambler.

    Let's start with the generator:

    That's simple and it is already characterised. The code is easy and has already been covered:

    uint16_t PRNGX,
             PRNGY;
    uint32_t PRNGC;
    
    uint16_t PRNG() {
      PRNGC += (uint32_t)PRNGX; // \_ actually a single adder
      PRNGC += (uint32_t)PRNGY; // /
      PRNGY = PRNGX;
      PRNGX = (uint16_t) PRNGC;
      PRNGC >>= 16;
      return PRNGX;
    }

     Then, add an adder in parallel and you get a checksum: instead of outputting a data stream, it inputs a data stream.

    Then the checksum is contained in X and Y.

    uint16_t CHKSX;
    uint32_t CHKSY,
             CHKSC;
    
    void CHKS(uint16_t m) {
      // extra caution with the extension of the sizes:
      CHKSC += CHKSX;     // 0..10001
      CHKSC += CHKSY;     // 0..2FFFF
      CHKSY  = CHKSX + m; // 0..1FFFE
      CHKSX  = CHKSC;     // 0..FFFF
      CHKSC >>= 16;       // 0, 1 or 2
    }
    

    This version uses a clever trick  to use only one carry with multiple values, which is better of SW and HW will use the above diagram.

    But it is possible to do both, input and output! That's what the scrambler does. It receives a message word that is mixed with X and directly outputs the resulting sum.

    That's it: all you have to do is add an output, or read the Y variable.

    To unscramble this, we have Y = s = m+X, so we recover m with m = s - X, and put s directly into Y.

    So here you have the 4 variations on the theme of PEAC and the code can be found at snippets_v3.c.

    Note that there are 2 incompatible versions of the scrambler (and matching descrambler). One with no carry (the values are truncated and there is no D flag) and the other with the Carry / D flag.

    ///////////////////////////////////////////////////
    ///////// Scrambler sans carry:
    uint16_t SCRX,
             SCRY;
    uint32_t SCRC;
    
    uint16_t SCR(uint16_t m) {
      SCRC += (uint32_t)SCRX; // \_ actually a single adder
      SCRC += (uint32_t)SCRY; // /
      SCRY  = SCRX + m; // mod 65536
      SCRX  = SCRC;     // mod 65536
      SCRC >>= 16;
      return SCRY;
    }
    
    ///////// Descrambler :
    uint16_t DSCRX,
             DSCRY;
    uint32_t DSCRC;
    
    uint16_t DSCR(uint16_t s) {
      DSCRC += (uint32_t)DSCRX; // \_ actually a single adder
      DSCRC += (uint32_t)DSCRY; // /
      uint16_t m = s - DSCRX; // mod 65536
      DSCRY  = s;
      DSCRX  = DSCRC;  // mod 65536
      DSCRC >>= 16;
      return m;
    }
    
    ///////////////////////////////////////////////
    ///////// Scrambler with carry :
    uint16_t SCRBX,
             SCRBY;
    uint32_t SCRBC,
             SCRBD;
    
    uint16_t SCRB(uint16_t m) {
      SCRBC += SCRBX + SCRBY;  // 0..1FFFF
      SCRBD += SCRBX + m;       // 0..1FFFF
      SCRBX  = SCRBC; // 0..FFFF
      SCRBY  = SCRBD; // 0..FFFF
      SCRBC >>= 16;   // 0..1
      SCRBD >>= 16;   // 0..1
      return SCRBY;
    }
    
    ///////// Descrambler v2: borrow :
    uint16_t DSCRBX,
             DSCRBY;
    uint32_t DSCRBC;
     int32_t DSCRBD;
    
    uint16_t DSCRB(uint16_t s) {
      DSCRBC += DSCRBY + DSCRBX;
      DSCRBD +=   s    - DSCRBX;
      DSCRBX  = DSCRBC;
    uint16_t m= DSCRBD;
      DSCRBY  = s;
      DSCRBC >>= 16;
      DSCRBD >>= 16;  //signed shift !
      return m;
    }
    

    The carry-less scrambler is alluring, being slightly shorter and potentially a tiny bit faster (in SW at least) but it suffers from poor avalanche behaviour. The carry-enable scrambler is more solid, and in HW the difference is too small to care. The double-carry scrambler is preferred though the single-carry one is kept for reference, as a necessary step to build the better one. I'll see if or how I can generate a better version that merges a variable.

    Note that these scramblers process 16 bits at once, while LFSRs usually only process one bit per cycle. LFSRs are also susceptible to stalling, unlike these scramblers. The only thing LFSRs can claim at this point is better mixing/avalanche, though that could also be "fixed".

  • PEAC generator: the French video

    Yann Guidon / YGDES10/06/2021 at 09:35 0 comments

    I planned an English-speaking video but the French derivative came first so here it is !

    I also store thesubtitles here.

    I'm waiting for the last piece of the English video to finish it...

  • Application as a scrambler

    Yann Guidon / YGDES09/28/2021 at 07:47 0 comments

    As already noted, there is this duality between the "generator" configuration (which generates a stream of numbers) and the chechsum (wich inputs a stream of numbers to disturb the sequence of the generator).

    In one case, we have a stream input, in the other there is an output stream. And what if there was both ?

    This third configuration is called "scrambler" and it is another common application of LFSR, in particular in Ethernet links and other low-level PHY to spread the bits.

    LFSRs have a drawback though : they are sensitive to the all-zero state combined with zero input. An extra circuit is usually required to detect long strings of 0s and kick the LFSR back.

    The PEAC checksum has been proved to be immune from this. Moreover, it can operate several bits in parallel, contrary to the LFSR that can't generate easily more than 1 bit per cycle.

    One example as a byte stream scrambler would have the scrambler upstream and the serialiser downstream, contrary to the LFSR method that works bit per bit. The PEAC w8 is weak so w10 is used, providing 3 bits for safety (Carry and the 2 MSB). Thus the maximum distance between blank periods is guaranteed. Moreover, the output could be tied to the MSB of the PEAC registers, instead of the 8 lower bits, to provide additional protection against chosen-text attacks and eventually provide extra checksum bits at the end of the stream. In this case, PEAC works both as a scrambler AND checksum.

    Once you start to play with scramblers, you touch the domain of cryptography. Obviously, PEAC can't be used alone in this context. But even as a simple scrambler, the PEAC generator has several benefits over LFSR.

  • More efficient scan of the state space

    Yann Guidon / YGDES09/26/2021 at 05:13 0 comments

    It should have been obvious from the beginning but it is easy to double the speed of scanning the state space if only the crossings are considered. It comes from the well known fact that all the orbits come in pairs !

    Each iteration must detect both X==0 and X==-1, with 0 going to the "primary orbit" and -1 (or "max") going to the "secondary orbit".
    The problem is that now, the iteration has 2 tests and 2 conditional jumps (though rarely used). That makes it a bit slower, though the speedup is a solid 2 when branch prediction is working well.

  • A different increment

    Yann Guidon / YGDES09/23/2021 at 21:24 0 comments

    So far the harvest of "maximal" or "ideal" sizes has only one very lucky find : w16. Few others have a suitable behaviour. Finding the next ones will be very difficult. Then it struck me that it was only a one-dimensional search. There is no constant but the increment acts like one, right ? So what if we changed it ?

    The carry can easily be "multiplied" but this requires a 3-inputs adder with a double carry output. It's still worth studying in SW but there would be no benefit: adding one instruction has a significant performance impact that could make PEAC useless.

    Another worthy idea to test is to simply invert the carry's value. This is mostly insignificant in an electronic circuit because "bubble pushing" makes the inverter disappear in other circuits. In SW, this still adds an operation (XOR 1 ?).

    Marsaglia proposes other types of operators, such as "subtract with carry". I have no idea what this changes. However this adds 2 options to try, with X-Y and Y-X, and even more options for C+= and C-=.

    I'm curious to see how the corresponding statespaces look though I have other priorities so far.

  • Marsaglia was (almost) there

    Yann Guidon / YGDES09/16/2021 at 06:46 0 comments

    I am re-reading the 1991 paper from Marsaglia & Zaman "A new class of random numbers generators" and it is very good.

    There are some important details that are very useful and also many significant differences, but the overlap is worth it.

    Let's start with the differences :

    • The authors focus on extremely long sequences, suitable for Monte-Carlo simulations, so the sequence lengths are in the order of 10^100 to 10^500. OTOH  I'm still limited to 2^52 or so...
    • They focus on "Lagged Fibonacci" with 20 to 50 intermediate registers to reach these extreme lengths, while I focus on only 2 registers (the special case with lag of 1 and 2).
    • The focus on extremely long sequences makes them miss the case for CRC/checksums.
    • What I call PEAC, is called "add-with-carry generators".

    That explains why they don't focus on the same configurations as I do.

    However they expose a lot of very pertinent ideas.

    • Part 2. starts with the Fibonacci sequence modulo 10 (without calling it Pisano, which might be a later/recent naming convention nuance) and this is almost exactly what is done with PEAC, except for the modulus. A lot of things match the PEAC generator until they change the lags for much longer sequences (sadly).
    • They bring more types of iteration types : subtract-with-borrow and more, including one with the potential to fill the whole statespace. I'm not interested because that would increase the complexity of the checksum algo.
    • They show why this system works : each digit is like a new development for the approximation of a rational number. And for PEAC, the ratio is Phi !
    • They prove why the sequences have 2^(2n) + 2^n - 2 states (because the "lags" are 2 and 1)

    Overall they bring a serious mathematical background though I'm not sure it would be useful for my case and I wouldn't know how to use it anyway. That would be useful for characterising w32 or even w64 but they seem to treat lag=2 as an exception, somehow.......

    I feel somehow vindicated and it contains many similarities with the script I'm working on for a soon-to-be-released video.

    I'm still in uncharted territory but if Marsaglia went near, 30 years earlier, it is very encouraging.

  • Explaining the mathematical basis of PEAC, the easy way.

    Yann Guidon / YGDES09/05/2021 at 23:24 0 comments

    I'm trying to make more videos to show the maths behind the algorithm. It's not easy but I'm progressing. The script of the video is easy to write but can easily go awry. The trick seems to be to prepare all the support material and then comment on them.
    So I've been programming some HTML/JS pages to use later.

    It seems that the easiest path to explain PEAC is by first going back to Fibonacci, then Pisano and modify it to get PEAC. Then the link with Fletcher is easy to make.

    1) http://ygdes.com/~whygee/PEAC/fibo.html shows the Fibonacci sequence. There is nothing new there, it is very well know already, but I set the stage for the next slides with two things:

    • I show the sequence of numbers in a X-Y graph with increasingly sparse points on a diagonal line with a constant slope.
    • I show the convergence of the ratio of consecutive numbers toward φ=1.6180339887498948482045868343656381177203091798057628621...

    This is a reminder that φ is a transcendental number, like π. In fact, by its definition, it's the "most transcendental" because its continuous fraction is the simplest. Transcendentals are very interesting because they can be the source of (almost) infinite sequences of finite values.

    From there, there is hope that a pseudo-random number generator (PRNG) could be derived from the Fibonacci sequence.

    2)  http://ygdes.com/~whygee/PEAC/pisano.html is a modification of the above page, with the integer values bound in the 0..15 range: this is the Pisano sequence modulo 16. This value was chosen because we computer guys love powers of 2.

    By representing the sequence of values in X-Y form, we see the dots filling only a small proportion of the whole statespace. Here, X is the current value and Y is the last value. Except near the origin, most dots look equally spaced. Larger moduli will not give better fill results because it is well known that for our cases (powers of 2), the sequence loops after 3/2×modulus. Indeed, the program stops after 24 cycles for a modulus of 16. This is far from being a good pseudo-random number generator... Other moduli can be tested but none creates a sequence near modulus squared. See sequence A000045 in the OEIS, and the related Wikipedia page.

    But we're almost there !

    3) Now let's modify the algorithm a little. http://ygdes.com/~whygee/PEAC/pisanoEAC.html adds a little twist:

    Every time the sum wraps around, the next computation adds 1 to the sum.

    This single trick bumps the sequence length to 135 values, which is a bit more than 16²/2=128.

    The sequence fills only half of the state space, the other half is filled by an identical/complementary sequence. Although it is not ideal, it is "more than good enough" for many applications.

    The real problem is to use moduli where the state space contains only 2 orbits. For the powers of two, only a few have been proved so far:
    2², 2³, 2⁴, 2¹⁰, 2¹⁶.  The video below shows some of these "good" sizes, with each color corresponding to an individual sequence, or "orbit":

    Other sizes (non-power of 2) are not practical, and sizes beyond 2²⁵ get extremely hard to compute. However the width of 16 bits is very convenient and is the basis for the PEAC checksum algorithm. More sizes are being explored.

    When a suitable width is found, it can be used as the basis for a checksum. A "good" checksum is a PRNG that is perturbed by external data. This perturbation has been studied (the addition is the safest method) and the stability has been proved, so PEAC could be an enhanced replacement to the classic Fletcher checksum algorithms, which have a very similar structure and complexity, but worse behaviour in many corner cases.

  • PEAC is NOT Pisano !

    Yann Guidon / YGDES08/29/2021 at 23:11 0 comments

    The Pisano period is defined by taking the Fibonacci sequence, modulo some integer.

    One could think that PEAC is different because the modulo is not applied in a classical way. End-Around-Carry adds an offset of one to the modulo result. But it goes much further than that.

    If you compute EAC on a 8-bit byte, you get the equivalent of modulo 255 (offset by 1).

    Now, if you collapse the statespace and remove the diagonal (that is not visited), you end up with 257 values !

    so PEAC works modulo 1+2^n !

    The above w4 is 16 wide (0 to 15) and 17 (0 to 16) high.

    This is explained by the formula to draw this diagram: X is horizontal (mod 16), but vertically, this is Y (mod 16) +C (mod 2).

    C is not collapsed immediately, which increases the size of the statespace. This separation of C makes a real difference and explains why this object is not easy to describe with the common mathematical tools.

    From there is sounds harder to believe that PEAC can be broken down into already existing cyclic groups, meaning that PEAC seems to "stand on its own" as a separate kind of field (?).

    I also notice that 2^n +1 looks like a Mersenne number (except Mersenne is 2^n -1). In the above example, we have w=4=>17 which is prime, and w8=>257 which is prime too, but w4 is ideal while w8 is not. Another counterexample is w10=>1025 which is obviously not prime. Prime numbers don't have predictive value for the quality of the statespace...

  • Scanning with multiple processors

    Yann Guidon / YGDES08/29/2021 at 22:29 0 comments

    Today I made a new prototype to explore and display the behaviour of the program that scans the PEAC statespace from many starting points. Have a look at http://ygdes.com/~whygee/a2_02.html

    Here are a couple of videos to show the behaviour when the number of parallel jobs is modified. In the worst case, with only two jobs, it takes a lot of time (all the steps to scan the whole orbit).


    When more jobs are possible, it becomes interesting... Here, each colour represents a job and not an orbit.

    Some arcs are extremely short, others -rare- are very long, so they waste most of the time.

View all 80 project logs

Enjoy this project?

Share

Discussions

[deleted]

[this comment has been deleted]

Yann Guidon / YGDES wrote 2 days ago point

so far, half a year and 50£.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/14/2021 at 04:16 point

George Marsaglia and Arif Zaman got quite a lot figured out in 1991:
https://projecteuclid.org/download/pdf_1/euclid.aoap/1177005878

Why didn't this catch up, but they focus on extremely large sequences with tens of registers of lag, and didn't look enough at just a pair of registers. They also didn't look at the link with checksums.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/08/2021 at 07:28 point

Funny !

https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

Toward a universal random number generator, G.Marsaglia, A.Zaman

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:24 point

https://www.researchgate.net/profile/Michael_Greenwald/publication/3334567_Performance_of_checksums_and_CRCs_over_real_data/links/53f4b15f0cf2888a749113c5/Performance-of-checksums-and-CRCs-over-real-data.pdf

"In practice, TCP trailer sums outperform even Fletcher header sums."

Ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/08/2021 at 00:51 point

"Ones complement (with EAC) Fletcher, however, has a weakness that sometimes offsets its probabilistic advantage: since bytes containing either 255 or 0 are considered identical by the checksum, certain common pathological cases cause total failure of Fletcher-255."

Re-ouch.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 23:09 point

http://users-tima.imag.fr/cdsi/guyot/Cours/Oparithm/english/Modulo.htm and others say that special adders exist to compute mod 2^n-1.

But this is not necessary because the carry is explicitly integrated in the next cycle.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 08/05/2021 at 22:20 point

https://www.cs.auckland.ac.nz/compsci314s1t/resources/Checksums.pdf  is a nice beginner-level overview of error detection basics. Polynomial checks are mentioned, as well as 1s complement / end-around-carry and Fletcher is well explained but no sight of Fibonacci.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/24/2021 at 03:52 point

Oh, nice !

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates