PEAC Pisano with End-Around Carry algorithm

French article in GNU/Linux Magazine France

If you wonder why, just click and read the log.

For the what, check Definitions and Draft as well as PEAC vs LFSR for a comparison.

The original algo/project name was "Fibonacci checksum" but it later appeared that it was not the most accurate choice because Leonardo Pisano (the real name of Fibonacci) is the name associated to the periodicity of the sequence under modulo. All I did was add the tiny detail of "end-around carry", or "1-complement addition", which changed everything.

-o-O-0-O-o-

As for how I came to this system...

In 2012, Piano Magic released their album "Life Has Not Finished with Me Yet". One song contains a weird repeating pattern...

Glen Johnson's lyrics are often cryptic and more evocative than objective, but any geek's mind would cling on this mantra at the end:

"Add X to Y and Y to X"

This is really weird but... Why ? What's the point in this obscure "song" with no precise theme or clear subject ? And what does it do ? This last question is the most easily answered : just follow the damned algorithm.

C'est parti...

X=1, Y=0
Y+=X => 0+1=1
X+=Y => 1+1=2
Y+=X => 1+2=3
X+=Y => 2+3=5
Y+=X => 3+5=8
X+=Y => 5+8=13
X+=Y => 8+13=21
Y+=X => 13+21=34
X+=Y => 21+34=55
...

No need to go further, most of you should have recognised http://oeis.org/A000045, the famous Fibonacci sequence.

This gave me a compelling idea to modify the old Fletcher & Adler algorithms, keeping their very low footprint and minimalistic computational complexity. Both of these well known algos use a pair of values and have a similar structure. The trick is that rearranging the data dependency graph provides the equivalent of a minimalistic polynomial checksum, because the result is fed back on itself, in a more robust way than Fletcher's algo.

At first glance, this new checksum loop's body becomes something like :

Y += ( X + datum[i  ] );
X += ( Y + datum[i+1] );
i+=2;

This loop body is totally trivial to unroll. As trivial is the design of the corresponding digital circuit. This early version seemed to contain the whole checksum entropy in the last computed value but now X and Y are part of the signature. And the really important enhancement is the use of the carry!

For superscalar 32 bits CPUs, the following code seems to work well though the missing carry hardware (and/or lack of language support) requires more instructions to emulate:

t = X + Y + C;    Y = X + data;
C = t >> 16;      X = t & 0xFFFF;

A more efficient variation exists which does not use a temporary variable:

C += X + Y;
Y = X + data;
X = C & MASK;
C = C >> WIDTH;

In this worst case, without support of a carry flag, that's 5 basic operations (not counting memory loads) that fit in 4 registers and 3 cycles, to process 2 bytes. Not too bad. I'll let you deal with alignment. But is it really safe or safer?

The following logs will show how the idea evolves and the performance increases, through discussions about carry wrap-around, register widths, scheduling, avalanche, parallelism, orbits, structure, and escaping black holes...

-o-O-0-O-o-

Logs:
1. The variable
2. Datapath
3. Adler32 weakness
4. Easy circuit
5. Periodicity
6. A promising 32-bit checksum
7. Progress
8. Why
9. Orbital mechanics
10. Orbits, trajectories and that damned carry bit
11. Two hairy orbits
12. How to deal with black holes
13. Anteriority
14. Moonlighting as a PRNG
15. A new name
16. More orbits !
17. First image
18. Structure and extrapolations
19. Even more orbits !
20. Start and stop
21. Some theory
22. Some more theory
23. Even more theory.
24. A little enhancement
25. sdrawkcab gnioG
26. Further theorising
27. What is it ?
28. A bigger enhancement
29. Going both ways
30. Can you add states ?
31. Please, GCC !
32. _|_||_||_|______|______|____|_|___|_________|||_|__
33. Carry on with GCC
34. A good compromise
35. The new non-masking algorithm
36. Add or XOR ?
37. A 32/64-bit version
38. Closing the trajectories
39....

Hash enhancements
Yann Guidon / YGDES • 09/25/2023 at 02:43 • 0 comments
The modulus m=741454 is suited for a scrambler or a checksum with 39 bits of entropy. This is great because its structure (a square root of an odd power of two) greatly increases the avalanche, unlike w16 or even w26. This is a better hash with the internal scrambling, but there is still a risk when input data (such as short character strings or packet headers ?) have low entropy. This effect killed Adler and Fletcher's algos in the IETF standards.

741454d = 1011.0101.0000.0100.1110b has 8 bits set among the 20, which is not perfect but it's still much better than w16. In the application that I focus on, I want en even better avalanche, which only occurs when the accumulators exceed m. So not only do we need to mingle with the intermediary results more, but also more often by making accumulators overflow a lot.

The context of interest is hashing pairs of bytes (16 bits) at once, like w16, but the 4 extra bits of the modulus create a sort of "buffer" that greatly delay the event of an avalanche : at best 11 cycles when the input data is FFFFh. So we must force the rollover at least once every two cycles.

There are three simple ways to achieve this by adjusting the 16 bits of input :
- Multiplication : multiply by 11 (65535 × 11 = 720885, close enough). Even multiplying by 10 only would have the same effect : there would be a rollover only then the input data allow it, and good luck if you have a stream of very small values (like 0 or 1)
- Shift : left by 3 is like multiplying by 8. This could be enhanced by another addition to get a ×9 and a bit more avalanche but it's mostly the same drawbacks as raw multiplication, just faster.
- Addition with a constant : here we have something !
741454d = B504Eh

Adding a constant to the input word great benefits :
- this lets us directly control the frequency of the rollover, hence the mixing/avalanche
- this also increases the inter-bits avalanche before it gets into the main X-Y registers,
- this is less dependent of the entropy of the data source
So the offset must be chosen close to m, but not too close. The hard limit is 741454 - 65536 = 675918 to prevent an overflow.

Just as in "pulse density modulation", this value will determine the quantity of rollover, hence the avalanche, it shouldn't be too high though or else it would become meaningless. 675918/2=337959 would be a first approximation but its bit pattern is too close to the modulus (rember : A+A=2A and PEAC does a LOT of additions) so we must reconstruct a custom bit pattern. It must be quite arbitrary and dense to favor avalanche and reduce interferences. It should also help spread bits from ASCII bytes.
The leader nibble would be 0111 = 7 to promote overflow, but not too much (the bias is at about 62%).
The next nibbles need to take into account the spread of ASCII values : the 5 or 6 LSB have quite a high entropy but the 2 MSB have low entropy. Low entropy means longer strings of 1s in the offset's pattern, while low entropy calls for short sequences of 01. So for each character, there is 11010101.
The final pattern is : 0111.11010101.11010101 = 7D5D5h wich can avalanche from the lower character to the higher one. Nice. Even small values will have a chance to affect higher bits faster.
OK this adds another operation with a magic constant to the mix but this also helps avalanche arbitrary data much faster and reduce the number of post-scan tumbling, which would add too much overhead for small strings. Also, let's not forget that the last operation that constructs the 39-bit hash is a multiplication, which adds its own overhead.
With the duo m=741454 o=513493 we might have a pretty serious and fast mixer...
French article(s)
Yann Guidon / YGDES • 06/08/2023 at 18:50 • 0 comments

In case you missed it, there is now a French article in GNU/Linux Magazine France that is now free to read online.

It is an overview of the field of checksums and explains where the idea of PEAC comes from, with several interesting and not-too-mathy notions that should help you choose and design your own system. I'm quite proud of it, it's a nice blend of theory and practice with significant applications.
........................................................................................................
Update 202309 : there is also a whole article about the general PEAC theory in the same magazine !
PEAC : l’arrière-petit-neveu de Galois et Fibonacci

GNU/Linux Magazine n°265 (septembre 2023)
PEAC for line encoding
Yann Guidon / YGDES • 05/21/2023 at 18:08 • 0 comments

So far I have considered PEAC as a pretty nice scrambler, but now I am looking at how to use it to harden a link, almost like a FEC system, on top of the other features.

I look at PEAC10 and PEAC39/2, the first being the simplest but can only process byte-wide streams by adding 2 bits per byte (25% overhead), while PEAC39/2 (m=741454) has a much better scrambling with still the same overhead (16 bits get 4 added check bits).

m=741454 has a nice property : any 20-bit word that is >= 741454 must be invalid, so it's like a "free parity bit" that requires only marginal effort to check (unlike parity for most computers) though it only works as 2/3rds of a bit in practice, for certain limited cases. If this is not detected on this very word, the further words will detect the mismatch with the extra bits (unlike plain parity bits which only work word by word).

I'm also interested by the ability of such a 16->20 recoder to always have at least one transition, that is ; at least one bit is 0 and one bit is 1. This would save hardware that inserts start and stop bits, or frame markers, that help with clock resynchronisation.

This recoder then performs many operations at once : checksum, scrambler, forward error detection and maybe even help during correction... It does not provide 8b-10b encoding spectrum-spreading (transition reduction) though. Furthermore, there is no requirement for a fully binary or square root modulus : a lower value than 741454 can be chosen to add only 2 bits of overhead, for example, with easy-to-spot values for faster resynchronisation.

Still this looks like a nice all-in-one algorithm for ultra-lightweight encapsulation for slow radio protocols that can't afford Reed-Solomon (think : microcontrollers), and more robust than Hamming SECDED (which only works on individual words).
999999 is maximal
Yann Guidon / YGDES • 05/17/2023 at 13:08 • 0 comments
So, 1 million is not an interesting orbit but 1 million minus one is, which is still pretty funny.

The logs are there : gPEAC_scans_1M.tbz if you need the raw data or build scripts. I stopped the scan at 1000615 for now, and I will continue slowly in the future but the returns diminish dramatically so the effort will be low on this front.

The histogram didn't change much, with the maximum still at 119 so there is some headroom for now.

The new sieve has 1132 entries, going up to 23251, which hints at a mostly linear progression : reaching 2K entries will require scanning up to about 2 million orbits, for example. But the selectivity increase is 7% only: the previous version
```
108628 sieve tests OK (21.29961%)
```
(with 500K orbits) now gives
```
101793 sieve tests OK (19.95941%)
```
Yet the scans show that the density is in the 7% ballpark around 1M, or 1/3 of the candidates that passed the sieve. Not many candidates give long orbits though. Well, it's a sieve, not a perfect filter.

You can get the new code there : gPEAC_1M.tbz
Enjoy.

Now I have to find why the sieve misses so many numbers.
Oh and 2147483648 % 7559=1984 so w31 is out. w30 still passes the sieve.
Delta coding and some stats
Yann Guidon / YGDES • 05/16/2023 at 03:42 • 0 comments
With the new harvest, I estimate that the perfect and maximal orbits under m=1M is around 100K.

This starts to make quite a large .h file to include in the sieve generator. Not that it's such a concern but scalability, you know.... And embedding such a huge array in the constants area of a program is not such a great idea, particularly if it is meant to evolve fast. However, playing with .h files is good because they are easy to process with UNIX scripts in bash, and this saves the pain of writing C code to import the text file, with error management etc.

But the list of orbits might get even larger in the future and I believe it's a missed opportunity to make things a bit better so, after staring at the numbers, I came to the conclusion that differential coding, or "delta coding", is an easy and interesting thing to do. So I now only encode the difference between consecutive orbits.

Using partial/incomplete data, I came up with this quick and dirty script to produce the following histogram :
```
S=0
sed   's/ -//'   gPEAC.000002-$MAX.txt | while read -r line ; do
  echo "$(( line - S ))"
  S="$line"
done > gPEAC.000002-$MAX.diff.txt

sort -g gPEAC.000002-907575.diff.txt |uniq -c |less
   6202 1
   5127 2
   5055 3
   6549 4
   5979 5
   4706 6
   3787 7
   3292 8
   3660 9
   3889 10
   3380 11
   2289 12
   1967 13
   2017 14
   2699 15
   1706 16
   1314 17
   1508 18
   1341 19
   1505 20
   1005 21
   1002 22
    747 23
    845 24
    942 25
    702 26
    441 27
    567 28
    613 29
    541 30
    390 31
    351 32
    370 33
    296 34
    335 35
    283 36
    232 37
    196 38
    173 39
    265 40
    157 41
    110 42
    112 43
    157 44
    136 45
     91 46
     64 47
    103 48
     83 49
     74 50
     75 51
     41 52
     33 53
     55 54
     60 55
     34 56
     40 57
     28 58
     31 59
     31 60
     21 61
     19 62
     20 63
     15 64
     17 65
     17 66
     15 67
     16 68
      9 69
      9 70
      5 71
     13 72
      3 73
      7 74
     11 75
      7 76
      5 77
      4 78
      5 79
      6 80
      3 81
      2 82
      1 83
      3 84
      1 86
      3 87
      1 88
      1 89
      2 90
      1 91
      1 92
      2 95
      1 96
      1 99
      1 100
      1 101
      2 102
      1 103
      2 104
      1 105
      1 114
      1 119
```
So far the largest gap is 119, which fits in a uint8_t instead of the previous uint32_t. The .h source file shrinks from 628041 to 275813 bytes, not a 4× gain as the binary format but still good and compresses well : 198495--->70512 !

There is still much room for improvement, for the numbers below 16, but I won't get into this... yet... if ever ?

When the scan reaches a point where the gap goes beyond 255, I'll have to reconsider my approach but so far, it's a pretty little hack and it gives a bit more insight on the spread of the perfect and maximal orbits. I observe that their density decreases quite slowly... Though no formula yet.
Getting to 1 million
Yann Guidon / YGDES • 05/16/2023 at 02:16 • 0 comments
Scanning has reached 900K and the last steps are excruciatingly slow and far from practical.

I have scripted as much as I could, thanks to basic UNIX/Linux tools, but this barely qualifies as a band-aid so of course, I'm thinking of reviving the idea of #CHOMP and finally giving it some substance. This means I need to refresh the #micro HTTP server in C.

The scanning currently mobilises 3 computers and has been plagued by several interruptions, with painful manual restarts, and constant babysitting to keep all the cores busy. Without the previous optimisations, the scan would have lasted a whole month, I believe. But then, manually checking all the fragments of log files gets ridiculous. So it's time to get #CHOMP into reality.

The idea has been around for a while now (Jan. 2022) so some ideas have already percolated.
- It uses the #micro HTTP server in C without the #HTTaP protocol because it must deal with several "clients" in fair (round-robin) fashion.
- it should work in JSON as well as binary format for convenience, and I must develop a client code
- each client can deal with one to thousands of "workers", and it could be a HTML/JS page or a C program with pthreads. This way, any type of computer or resource can participate in the computation effort. After all the goal is to use SIMD and CUDA for example, so a future client could provide 100K workers one day.
- The protocol is simple, graciously fails, with timeouts for the clients, in case one loses power/connection.
- The server manages the logs, gathers the data, allocates the ranges, makes sure the workers are not cheating, reallocates jobs when a client vanishes...
- This will be useful not only for scanning gPEAC smoothly, but also to scan PEACs exhaustively (among other trivially parallel tasks)
Meanwhile, I'll have reached 1 million soon, and might continue scanning in the background, just for the sake of it and because it's free time, and the computer will not be used for other purposes anyway.
A missed opportunity
Yann Guidon / YGDES • 05/14/2023 at 01:39 • 0 comments

The scan of the gPEAC space is nearing 900000 so the goal to reach 1 million in about one week (as boldly stated before) is more or less achieved. Yet, by looking at the behaviour of the system, beyond the obvious flaws of the manual synchronisation of multiple computers, there is another thing that could have been done easily and provided great hindsight later : I should have displayed the length of the primary orbits. I suspect now that this could be very useful for later : I stop computation just beyond the size of a maximal orbit, but maybe it could be shorter. I need the numbers generated but not collected to create a hypothesis concerning the orbit lengths beyond the perfect and maximal orbits.

Fortunately, it is quite easy to regenerate this list of primary orbits lengths. We already have the list of the P&M orbits, and we'll have a better sieve, Re-scanning the first million range will be fast because we can already avoid the huge orbits, as well as the sieved out ones.
The good numbers
Yann Guidon / YGDES • 05/11/2023 at 20:39 • 0 comments

(updated 20230517)
The last logs (153. 500000 is maximal, 154. w32 "Just in case", 155. More candidates and 741454) have shaken a few misconceptions and hopes, and this log summarises the current situation.

Binary PEAC

The old well known widths have not changed : 2, 3, 4, 10, 16 and 26 bits. The perfection of w26 is interesting to examine but not a critical issue.

w32 is now out of the race.
w31 failed too because 2147483648 % 7559=1984.
w30 still passes the sieve.
Even larger widths (51 52 54 55 59 60 and 63 passed the current sieve_1M) might not be testable in the near future.

gPEAC

2^(29/2) = 23170 is known perfect and fits in a pair of 15-bit registers. it's suitable for small systems, small data, with its 500M orbit.

2^(39/2)-1 = 741454 is known perfect, using 19-bit registers. It's a direct upgrade for PEAC16 with its longer orbit (549.754.775.568) and great mixing (10110101000001001110). It's appropriate for 16-bit scramblers, hash tables, checksums, and the 3 extra bits (compared to PEAC16) add even more robustness.

Other candidates have not been found. Many have failed a quick scan/check in seconds or minutes, up to 77.

More candidates and 741454

Yann Guidon / YGDES • 05/10/2023 at 18:15 • 0 comments

After the fiasco of w32 (see the last log 154. w32 "Just in case"), there are still a lot of moduli to explore, in particular with the square root family.

$ for i in $(seq 25 2 127); do echo -n '2^('$i"/2) = "; n=$(echo 'sqrt(2^'$i')' |bc); ./sieve_out $n f ; done |grep OK |tee sqrt2.txt
2^(29/2) = 23170 :  OK (known perfect)
2^(33/2) = 92681 :  OK (not P or M)
2^(45/2) = 5931641 :  OK   (15231323470 in 20s)
2^(49/2) = 23726566 :  OK  (6545928300254 only, in 3h)
2^(59/2) = 759250124 :  OK
2^(77/2) = 388736063996 :  OK
2^(93/2) = 99516432383215 :  OK
2^(97/2) = 398065729532860 :  OK
2^(107/2) = 12738103345051545 :  OK
2^(113/2) = 101904826760412361 :  OK
2^(121/2) = 1630477228166597776 :  OK
2^(125/2) = 6521908912666391106 :  OK

Not a great choice though. Let's try with an offset:

for i in $(seq 25 2 127); do
  echo -n '2^('$i"/2)-1 = "
  n=$(echo 'sqrt(2^'$i')-1' |bc)
  ./sieve_out $n f
done |grep OK |tee sqrt2-1.txt
2^(27/2)-1 = 11584 :  OK (not Perfect or maximal)
2^(33/2)-1 = 92680 :  OK  (not Perfect or maximal)
2^(37/2)-1 = 370726 :  OK  (not perfect or maximal)
2^(39/2)-1 = 741454 :  OK proved perfect !
2^(49/2)-1 = 23726565 :  OK (977239131552 only)
2^(65/2)-1 = 6074000998 :  OK   5189739603 only :-(
2^(69/2)-1 = 24296003998 :  OK   4549760445 only :-(
2^(71/2)-1 = 48592007998 :  OK  220895715381 only
2^(77/2)-1 = 388736063995 :  OK (222304372407 only)
2^(83/2)-1 = 3109888511974 :  OK
2^(87/2)-1 = 12439554047900 :  OK
2^(93/2)-1 = 99516432383214 :  OK
2^(111/2)-1 = 50952413380206179 :  OK
2^(125/2)-1 = 6521908912666391105 :  OK
2^(127/2)-1 = 13043817825332782211 :  OK
[yg@localhost gPEAC_500K]$ grep 370726 gPEAC.000002-509965.txt 
(nothing)
[yg@localhost gPEAC_500K]$ grep 92680 gPEAC.000002-509965.txt
3392680
[yg@localhost gPEAC_500K]$ grep 11584 gPEAC.000002-509965.txt 
115841

I'm scanning 741454 right now....

[yg@localhost gPEAC_500K]$ nice -n 90 /usr/bin/time ./gPEAC_scan 741454 741454 f
741454(274877387784)=274877391679 P
1 candidates among 1 numbers (100.00000%)
1 hits among 1 candidates (100.00000%)
489.76user 0.10system 8:12.40elapsed 99%CPU

YESSSS At least good news :-)
741454 (10110101000001001110) is thus suitable for a 39-bit hash using a pair of 20-bit registers. Its orbit has a length of 549754775568 (111111111111111111100000010100000010000 in binary)
It is suitable for hashing 2 bytes at once with quite a lot of headroom against collisions and aggression. It's a direct upgrade to PEAC16x2 !

w32 "Just in case"
Yann Guidon / YGDES • 05/09/2023 at 11:51 • 0 comments
The recent sieving effort has not yet slashed the hope that w32 be perfect or maximal. The sieve can only tell if a modulus is not either of them but the required amount of sieving makes the sieve almost as large as the number to be even 1/2 accurate.

But since the possibility is not ruled out so far, the chances are still good that it might be. And considering the amounts of time involved in this research, it could be worth it to have at least a "try": it would be good to start mapping its structure, even partially. In 2 years, I have not even considered scanning the primary orbit, it could be short and that would be the end of it !
```
[yg@localhost gPEAC_500K]$ /usr/bin/time ./gPEAC_scan 4294967296 4294967296
4294967296 M
1 candidates among 1 numbers (100.00000%)
1 hits among 1 candidates (100.00000%)
17.18user 0.00system 0:17.21elapsed 99%CPU
[yg@localhost gPEAC_500K]$ 
```
OK... sigh... The scanner says it's maximal but only lasts 17 seconds ? Time for more investigation.

.

.

Aaaaaand it seems that it is caused by an overflow in the computation of the orbit length... No wonder it stops early. Or maybe that's just the case that w32's primary orbit's length is 13.603.546.275 only :-/ So this case is closed after 2 years. Let's now look at the other possible orbits (those that the sieve has not ruled out):
- w30 hasn't stopped in 27 hours. But it's less interesting than w31.
- w31 didn't stop after 24 hours. That would be interesting to investigate, right ? The sign bit could be used instead of the carry bit, as a lesser replacement for w32: it would not be able to hash 32-bit numbers though :-/
- w34 stopped in 2m31s after 118.781.994.407 iterations.
- w35 took only 20s with 15.959.149.451 iterations
- w40 looks barely stronger with 8803509994715 iterations in 15800s/4h39m
How is it possible to be this unfortunate ?

Time to reconsider the goals. Let's see if w31 is really worthy more efforts.

Fortunately, there is a hit with sqrt(2^39)

View all 163 project logs

gPEAC_1M.tbz Sieve generator, sieve, and scanner for the gPEAC orbits, "1 million" edition. x-bzip-compressed-tar - 81.95 kB - 05/17/2023 at 14:51		Download
gPEAC_scans_1M.tbz All the logs of the scan up to 1000615, and updated scripts. Raw data for those playing at home and wanting to reproduce everything (or in case of mistake) x-bzip-compressed-tar - 2.37 MB - 05/17/2023 at 12:43		Download
gPEAC_500K.tbz Sieve generator with the 501K range (47038 known orbits), sieve program, orbit scanner with fancy xterm launcher... x-bzip-compressed-tar - 283.89 kB - 05/09/2023 at 02:11		Download
gPEAC_scan_logs_501K.tbz all the maximal and perfect orbits of gPEAC, from 2 to 501000. .h file included. duplicate 89651 fixed. x-bzip-compressed-tar - 1.13 MB - 05/09/2023 at 01:20		Download
gPEAC_scan.tgz The new faster scanner ! see log 150. x-compressed-tar - 51.74 kB - 05/06/2023 at 07:54		Download

PEAC Pisano with End-Around Carry algorithm

Description

Details

Files

gPEAC_1M.tbz

gPEAC_scans_1M.tbz

gPEAC_500K.tbz

gPEAC_scan_logs_501K.tbz

gPEAC_scan.tgz

Project Logs

Collapse

Hash enhancements

French article(s)

PEAC : l’arrière-petit-neveu de Galois et Fibonacci

PEAC for line encoding

999999 is maximal

Delta coding and some stats

Getting to 1 million

A missed opportunity

The good numbers

Binary PEAC

gPEAC

More candidates and 741454

w32 "Just in case"

Discussions

Similar Projects

σα code

3RGB image lossless compression format

Game of Life bit-parallel algorithm

µδ code

PEAC Pisano with End-Around Carry algorithm

Become a Hackaday.io member

Just one more thing

Description

Details

Files

Project Logs Collapse

Binary PEAC

gPEAC

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse