If you wonder why, just click and read the log.
The original algo/project name was "Fibonacci checksum" but it later appeared that it was not the most accurate choice because Pisano (the real name of Fibonacci) is the name associated to the perodicity of the sequence under modulo.
In 2012, Piano Magic released their album "Life Has Not Finished with Me Yet". One song contains a weird repeating pattern...
Glen Johnson's lyrics are often cryptic and more evocative than objective, but any geek's mind would cling on this mantra at the end:
Add X to Y and Y to X
This is really weird but... Why ? What's the point in this obscure "song" with no precise theme or clear subject ? And what does it do ? This last question is the most easily answered : just follow the damned algorithm.
X=1, Y=0 Y+=X => 0+1=1 X+=Y => 1+1=2 Y+=X => 1+2=3 X+=Y => 2+3=5 Y+=X => 3+5=8 X+=Y => 5+8=13 X+=Y => 8+13=21 Y+=X => 13+21=34 X+=Y => 21+34=55 ...
No need to go further, most of you should have recognised http://oeis.org/A000045, the famous Fibonacci sequence.
This gave me a compelling idea to modify the old Fletcher & Adler algorithms, keeping their very low footprint and minimalistic computational complexity. Both of these well known algos use a pair of values and have a similar structure. The trick is that rearranging the data dependency graph provides the equivalent of a minimalistic polynomial checksum, because the result is fed back on itself, in a more robust way than Fletcher's algo.
At first glance, this new checksum loop's body becomes something like :
Y += ( X ^ datum[i ] ); X += ( Y ^ datum[i+1] ); i+=2;
This loop body is totally trivial to unroll. As trivial is the design of the corresponding digital circuit. This early version seemed to contain the whole checksum entropy in the last computed value but now X and Y are part of the signature. And the really important enhancement is the use of the carry!
For superscalar 32 bits CPUs, the following code seems to work well though the missing carry hardware (and/or lack of language support) requires more instructions to emulate:
t = X + Y + C; Y = X ^ data; C = t >> 16; X = t & 0xFFFF;
In this worst case, without support of a carry flag, that's 5 basic operations (not counting memory loads) that fit in 4 registers and 3 cycles, to process 2 bytes. Not too bad. I'll let you deal with alignment. But is it really safe or safer ?
The following logs will show how the idea evolves and the performance increases, through discussions about carry wrap-around, register widths, scheduling, avalanche, parallelism, orbits, structure, black holes...
1. The variable
3. Adler32 weakness
4. Easy circuit
6. A promising 32-bit checksum
9. Orbital mechanics
10. Orbits, trajectories and that damned carry bit
11. Two hairy orbits
12. How to deal with black holes
14. Moonlighting as a PRNG
15. A new name
16. More orbits !
17. First image
18. Structure and extrapolations
19. Even more orbits !
20. Start and stop
21. Some theory
22. Some more theory
23. Even more theory.
24. A little enhancement
Ironically, this structure reminds me a LOT of a 2-tap LFSR. So a more "powerful" version would use a Mersenne LFSR-inspired structure, with additional registers : Z, W, V... In several 2005 articles, I have explored "Mersenne Twisters" with 3 and 4 taps but even that would be overkill for such a basic application where throughput is the only point. The only effect would be to delay the feedback and reduce the growth factor from φ=1.6something down to maybe 1.3. The most important point is that ALL input bits must have an effect that lasts until the end of the checksum loop, or else errors can easily creep in (and this is where Adler is better than Fletcher).
- Endian (who uses Big Endian anyway today ?)
- Alignment (align...