It almost went according to the plan, as you can see in the code below:
// The original algo (optimised) C += X + Y; Y = X + M_; X = C & PISANO_MASK; C = C >> PISANO_WIDTH; // The new algo (crude) t = X + Y + C + M_; C = ((t ^ X ^ Y) >> PISANO_WIDTH) & 1; Y = X; X = t;
Of course I wanted to test the equivalence so the new code is not optimised, and it's only a matter of time until it gets squished. The only disappointment is that there is one more value to XOR than I wanted, but it's a minor cost, particularly since this can be optimised out, because one XOR is redundant (just use one more temp var).
But the good news is :
It works as intended on (almost) the first try !
With only 26 bits that are "known good", the variables can be 32 or 64 bits large, and the algo doesn't spill precious data/entropy in the void of a missed carry out flag. With 2 32-bit registers, you can be safe to checksum blocks containing 2^52 zeroes.
Now I have to reorganise and optimise this.