Modular equivalence

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 08/09/2021 at 09:430 Comments

The last log 64. The new version of PEAC16x2 has shown one "reference version" with EAC applied on both registers at every cycle, which looks quite inefficient and breaks the promise of "high performance". That slow version however serves to establish a baseline for the equivalence of hardware and software implementations. The slow "baseline" version is equivalent to the previous faster version, which is harder to implement with classic circuits.

OTOH, thanks to modular properties, we know that (( x mod m) + y ) mod m =  (x+y)mod m so many operations of "renormalisation" can be delayed and postponed. This is one of the known optimisations for Adler and Fletcher's algos. This is also shown in Pisano16x2_v5.c with 3 versions of the same algorithm.

And during the unrolling, something new happened: one of the register's values explodes by missing renormalisation. The result remains good as long as the computation is short but it clearly does not behave as expected.

That's when I realised that the original algorithm, step by step, would renormalise X and Y in turn, ensuring that the results would not overflow. Now, with the steps duplicated, the variable swaps cancel out and only one variable is renormalised.

Aaaand... there is no solution, because despite using 3 variables in the loop, only 2 perform the actual computations. The only way to handle this is with an odd number of steps per loop. I'll have to use tricks from