The last log 64. The new version of PEAC16x2 has shown one "reference version" with EAC applied on both registers at every cycle, which looks quite inefficient and breaks the promise of "high performance". That slow version however serves to establish a baseline for the equivalence of hardware and software implementations. The slow "baseline" version is equivalent to the previous faster version, which is harder to implement with classic circuits.
OTOH, thanks to modular properties, we know that (( x mod m) + y ) mod m = (x+y)mod m so many operations of "renormalisation" can be delayed and postponed. This is one of the known optimisations for Adler and Fletcher's algos. This is also shown in Pisano16x2_v5.c with 3 versions of the same algorithm.
And during the unrolling, something new happened: one of the register's values explodes by missing renormalisation. The result remains good as long as the computation is short but it clearly does not behave as expected.
That's when I realised that the original algorithm, step by step, would renormalise X and Y in turn, ensuring that the results would not overflow. Now, with the steps duplicated, the variable swaps cancel out and only one variable is renormalised.
- The basic solution is to renormalise both variables at the end of each loop, though the performance would be bad, reducing the benefits of unrolling.
- The next alluring solution is to unroll by an odd number, like 3, 5 or 7 steps per loop. 3, 7 and 15 are Mersenne numbers which are covered by http://homepage.divms.uiowa.edu/~jones/bcd/mod.shtml but the modulo computations are still chunky...
- A power of two is certainly desirable so the remaining solution is to be careful of the variable renaming. Some graph colouring is now necessary. But is there a solution ?
Aaaand... there is no solution, because despite using 3 variables in the loop, only 2 perform the actual computations. The only way to handle this is with an odd number of steps per loop. I'll have to use tricks from https://homepage.divms.uiowa.edu/%7Ejones/bcd/divide.html...