A good compromise

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 06/17/2021 at 01:530 Comments

At this moment, the search for w26 is only at the 73% mark so it's not officially "good" but very probably so. I doubt I can ever confirm or refute w32 so, this far, w26 is the best I can have confidence in. And w32 is very unlikely "good" anyway. So what can I do with w26 ? And how ?

I think it's not as useless as I originally thought, on the contrary, and even more so since my last encounters with GCC. I want something that is quite simple, robust, and which works anywhere without ridiculous compiler wrangling and weird gymnastics.

Remember that even though w26 works on 26 bits, its period is double that ! So even with a pair of 32-bit registers, I get a 2⁵⁶ period without effort. This should be enough, right ? And with such a size, you'd be wiser to chunk your data, such as 16M blocks just to be safe.

So let's see how to design a checksum that works well for 32 and 64 bits, using 32-bit wide registers.

So far we have split the 32 bits into 3 zones :

There is some headroom, and should work well in most cases. But we have to solve one last issue : we can't get the carry bit.

In fact we can't get it directly because the previous methods would either mask the "guard" MSB or shift the computation such that the carry bit appears on the carry flag. But this bit is not lost. The iterator virtually operates on the bits 0 to 25 so we could detect a carry when the bit #26 changes state. So the carry is computed by

 (((X^Y)>>26) & 1)

which is a decent code sequence on any decent 32-bit CPU.

In hardware, it's even more simple, particularly if you can access intermediary results from the adder's internal nodes.

The corresponding source code adds some latency, there is a string of 3 dependent instructions which reduces the speed a bit, but it's not horrible either. Furthermore we can use ADDs everywhere, and not worry about missing a carry in the MSB anymore in case we add the carry separately. Overall, it's a good design gain and the whole thing, whether you use 32 bits (the last result) or 64 bits (the last result concatenated with the precedent) has a lot of headroom.

From there, we can make this macro:

Z = Z + W + dat16 + (((Z^W) >> 26) & 1)

Then substitute Z and W for X and Y alternatively.

PISANO_MACRO(Y, X, data[0])
PISANO_MACRO(X, Y, data[1])

There is still room for "enhancement" with the x86 breed, using the RCx (Rotate through Carry) opcodes. For example RCL 6 will put the bit 26 into the carry bit, and you just need to ADC to save one  ADD opcode. So the sequence would be

ADD Y, dat
RCL T, 6

I wish I could avoid the first MOV... but I have to admit its inevitableness and test the hell out of it....