A project log for Pisano-carry Checksum algorithm

Add X to Y and Y to X, says the song. And carry on.

I just discovered that Alder32 has a weakness, which is nicely covered by RFC3309.

From the RFC:

From an email from Jonathan Stone, who analyzed the Adler-32 as part of his doctoral thesis:

"Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)

Adler-32 uses two 16-bit counters, s1 and s2. s1 is the sum of the input, taken as 8-bit bytes. s2 is a running sum of each value of s1. Both s1 and s2 are computed mod-65521 (the largest prime less than 2^16). Consider a packet of 128 bytes. The *most* that each byte can be is 255. There are only 128 bytes of input, so the greatest value which the s1 accumulator can have is 255 * 128 = 32640. So, for 128-byte packets, s1 never wraps. That is critical. Why?

The key is to consider the distribution of the s1 values, over some distribution of the values of the individual input bytes in each packet. Because s1 never wraps, s1 is simply the sum of the individual input bytes. (Even Doug's trick of adding 0x5555 doesn't help here, and an even larger value doesn't really help: we can get at most one mod-65521 reduction.)

Given the further assumption that the input bytes are drawn independently from some distribution (they probably aren't: for file system data, it's even worse than that!), the Central Limit Theorem tells us that that s1 will tend to have a normal distribution. That's bad: it tells us that the value of s1 will have hot-spots at around 128 times the mean of the input distribution: around 16k, assuming a uniform distribution. That's bad. We want the accumulator to wrap as many times as possible, so that the resulting sum has as close to a uniform distribution as possible. (I call this "fairness".)

Luckily, the Fibonacci Checksum covered here is not affected as much because

1. The computations are over wider words, more data is likely to be spread over the registers
2. Both halves of the registers are combined by XOR, without carry generation, to further "mix" the result and reduce the width from 64 to 32 bits (or 32 to 16 bits if you wish)
3. If you have short packets, a shorter checksum is more appropriate.
4. The Fibonacci growth of φ=1.6something makes this checksum a cousin of polynomial checksums, with a slow growth yet still faster than Adler32:
1. 32-bits words are filled in about 45 cycles (if we suppose the first word = 0x00000001 and the rest cleared)
2. 16-bits words are filled in 24 cycles (idem, starting from 1 followed by 0s)

However : be cautious with payloads that are short and/or contain very similar or consecutive values, such as protocol headers (such as with TCP). A mitigation would place the most variable data (such as "sequence number") at the very beginning to provide as much avalanche and wraparound as possible).

## Discussions 