Some more theory

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 05/08/2021 at 17:310 Comments

The last log is only scratching the surface and the redaction was interrupted so I resume here.

Let's start with some more OEIS :

I don't know if they are really related but somewhere, something should match and these are some of the least implausible. I have not found a matching underlying theory yet so I'm fishing for whatever can be found. The results of the next brute-force searches will tell us more and refine the fishing expedition.

So a lot of responsibility remains on the shoulders of the brute-force search. And the "theory" could help reduce the strain. For example, the "backwards equation" can double the speed of the the exhaustive search for orbits !

Consider this: if the orbit has a forward and a backwards direction, then both ways can be explored simultaneously and meet at the middle. This is not really a win when taken from the perspective of the total amount of computations. However the steps can be run in parallel !

The computation is still very serial in nature (particularly because each step of both computations must be compared) but that's where Out Of Order Execution shines, because it can run several instructions in parallel and reorganise the execution flow at will. Indeed the pipeline will be better used than in the purely serial code that is currently running on my old dual-core i7. So I should focus on the backwards equation. The diagram below shows the principle, where both paths are one half shorter:

But then, one more idea comes: What if we could determine the coordinate of the middle of the orbit in advance ? From there, it would be easier to only compute one half and meet at the quarter of the orbit, which would further half the computation effort.

Then, from there, what keeps me from going further and bissecting the orbit further and further and... After all the orbit length is close to a power of two so it would be a good match. But this would work only if an ideal orbit exist, so what makes these orbits ideal in the first place ?

There is another way to see it, not with the "top-down" approach but "bottom-up". Let's start with the simplest, smallest system and see how it evolves.


The case with width=0 only contains 2 stuck states so there is nothing much to learn there.


With width=1, we have 8 states with X and Y oscillating between 0 and 1.


Wait, what ? We have a 4-long orbit ? Anyway the binary sequence looks legit. If we exclude (0,0,0) and (1,1,1) the other orbit should be 2-long, right ?


No, it falls on the "diagonal" and there is only one orbit, and the two trajectories enter the loop at a different place.


For width=2, there are 32 states, 2 orbits and X is modulo 4 (spans 0..3), 2 symmetrical orbits of length=9 and 2×6 trajectories.

Let's look at both orbits in parallel:


Neat ! Now let's rotate the 2nd orbit a little to start with (2,3,1):


If you invert the carry ( 0<=>1 ) and the X and Y ( 0<=>3, 1<=>2 ) you find the same sequence as the first orbit ! This was not possible or true with width=1. QED.


What else can we learn from this ? Another noticeable feature is that going backwards in the orbit is possible, despite the carry. The last log 21. Some theory has deduced the antecedent from the current coordinate by reversing the successor formula:

X' = X + Y (mod 2^w)
Y' = X
X = Y'
Y = X' - Y' (mod 2^w)

The carry introduces a little step that is reverted with this modified formula:

Y' = X
t  = X + Y +C
X' = t mod 2^w
C' = t  /  2^w
X = Y'
t = X' + (C×2^w)
Y = t - Y' (mod 2^w)
C = ...

Finding the new C is a bit tricky but we already have a heuristic: the carry is usually dependent on the coordinates, with C=0 if X>Y and C=1 if X<Y.

C = sign(X-Y)

The trouble is with the diagonals so let's list them:

(0,0,0)=>(0,0,0) (forbidden)
(3,3,1)=>(3,3,1) (forbidden)

It appears (even in the forbidden states) that the value of C is conserved and Y=3 when C=1 and Y=0 when C=0. X is unchanged as well !

This is not really a paradox or surprise, considering that (1,1,0) is not (1,1,1) because the latter is in fact (5,1) with the MSB of X in C. So when we have (z,z,1) this is equivalent to (4+z,z) but, unlike a classic version, there is no negative range so if we subtract z from 4+z, we're left with 4 which is the modulus and can only be represented as (z,3,1). OTOH, for (z,z,0), the rest of z-z=0 can be represented with no help from the carry.

The algorithm is straight-forward but there is the requirement to avoid branches so a computation-only formula is needed. Yet, the CPU cores of the last 20 years have an efficient branch predictor and as the width increases, the number of states is squared so the case of (z,z,1) gets pretty rare, enough to avoid a heavy computation that would affect the duration of computation of all the other cases. One branch prediction failure every million of step is a relatively small cost. And ideally, the compiler would transform an if structure into appropriate instructions, such as CMOV etc.

Hence the new formula:

X = Y'
t = X' + (C×2^w)
s = t - Y'
Y = s mod 2^w
C = s >> w (mod 2)  (sign)
if s==2^w then  (rarely taken)
  C = 1
  Y = 2^w -1

This could be simplified a bit:

t = X' + (C×2^w)
s = t - Y'
if s==2^w then   (rarely taken)
  s=2^(w+1) -1
Y = s mod 2^w
C = s >> w (mod 2)  (sign)
X = Y'     (pushed to the bottom to let the others execute first)

Some benchmarking will decide...