The case for additive mixing might be closed.

While reading more about the flaws of checksums, the theme of bad results with small and/or highly biased values became clear so I turned my efforts to this subject.

I think I now have the smoking gun for ditching XOR as a mixing function, however convenient it might be, at least for the compacted 16-bit version.

I have run some tests to compare the collision behaviours between + and ^, and here is what I got:

[yg@localhost FiboSum]$ for i in $(seq 1 100)
 do
   echo $i
   gcc -Wall test_Pisano16x2.c -DMIX_OP=^ -DSHOWDIFFONLY \
        -DBUFFSIZE=20 -DINITVAL=$i -o test_pisano &&
   ./test_pisano
  done |grep ':'
 3 [   3]:  5B18 57A20376 - 5B18 57A10377 ^ 0000 00030001 :  0  3
 3 [   6]:  5B1F 57A70378 - 5B1F 57A5037A ^ 0000 00020002 :  0  2
 3 [  12]:  5B29 57AD037C - 5B29 57A90380 ^ 0000 000400FC :  0  7
 3 [  15]:  5B2C 57AE037E - 5B2C 57A90383 ^ 0000 000700FD :  0 10
12 [  60]:  14FA AADF6A1B - 14FA AADA6A20 ^ 0000 0005003B :  0  7
15 [  90]:  0657 7A488C0F - 0657 7A4A8C0D ^ 0000 00020002 :  0  2
 3 [  24]:  5B1D 57B90364 - 5B1D 57B1036C ^ 0000 00080008 :  0  2
 3 [  27]:  5B20 57BA0366 - 5B20 57B1036F ^ 0000 000B0009 :  0  5
 3 [  30]:  5B27 57BF0368 - 5B27 57B50372 ^ 0000 000A001A :  0  5
 3 [  48]:  5B25 57910394 - 5B25 57A10384 ^ 0000 00300010 :  0  3
 3 [  51]:  5B28 57920396 - 5B28 57A10387 ^ 0000 00330011 :  0  6
 3 [  54]:  5B2F 57970398 - 5B2F 57A5038A ^ 0000 00320012 :  0  5
 3 [  60]:  5B39 579D039C - 5B39 57A90390 ^ 0000 0034000C :  0  5
 3 [  63]:  5B3C 579E039E - 5B3C 57A90393 ^ 0000 0037000D :  0  8
 3 [  93]:  5B8E 57FC0392 - 5B8E 57FD0391 ^ 0000 00010003 :  0  3
 3 [  96]:  5AF5 57C10334 - 5AF5 57A10354 ^ 0000 00600060 :  0  4
.....

There is an obvious collision with multiples of 3 for the 16-bit version, while the 32-bit wide result is clearly good.

The solution, if you really want to keep XOR, is to mix the 16-bit result differently,

return (uint16_t)(C+Y+X);

is replaced by

return (uint16_t)(C+(Y<<8 | Y>>8)+X);

This is one rotation of Y to prevent the fateful alignment, but it could have harmful effect somewhere else.

More tests show that ADD is better, even with the rotation trick:

   4858  6 août  09:27 test_collision_+.log
  19851  6 août  09:32 test_collision_x.log
   5316  6 août  09:39 test_collision_xr8.log

The xor_rot8 version seems to have 8% more collisions, for buffers of 4Ki words (8Ki bytes).

Meanwhile, the ADD version has found only 64 collisions for (4096*1000)=4M tests, or one error for every 64000 tests, which is close to the theory of 65536. Yes, 16-bit checksums are lousy.

Have a look at case_against_xor.tgz for some code.

Discussions

Thomas wrote 08/07/2021 at 06:14

I need some help there: an ideal CRC-16 would have one error for every 2^16 tests, would it? So the XOR is like a really bad CRC and the ADD Pisano is better than the latter?

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/07/2021 at 15:33

Hi Thomas !

"an ideal CRC-16 would have one error for every 2^16 tests"

Yes and no, the subtlety is in "for every X".

The trick with CRC-16 and all CRCs in fact is that depending on the choice of the polynomial, it can ensure that it will detect "all" errors of a certain kind, for example:

* all odd numbers of bit flips

* all burst of flips up to the size of the polynomial,

* all even numbers of flips "up to a certain message length"

and so on. That's why CRCs are so loved : you have a mathematical construction that promises you a result in chosen cases.

When you apply the pigeonhole principle however, and test a CRC on random data, you'll find 2^-16 errors anyway, and these are "pushed" in longer random patterns. Because CRC is more sensitive to a class of errors, it is mechanically less so on other classes, but it's all good because the latter are deemed much less interesting or probable.

This all hinges on the operator XOR and its properties, the CRCs are "not data dependent" and the error detection efficiency is not affected by the data.

OTOH PEAC is built around a different style of arithmetics and uses ADD with a modulo 2^n-1. I have not studied the types of errors that it catches but I expect they would be different.

"XOR is like a really bad CRC and the ADD Pisano is better than the latter?"

I am not sure what you mean by that, in which context.

Anyway the latest insight made me think further and I just came up with another evolution of the algorithm which might also explain why the ADD version is much better, while providing more execution efficiency. That program is going to fly fast :-)

Are you sure? yes | no

Thomas wrote 08/07/2021 at 15:49

Thanks, that helps :-)
I was referring to the statement "The xor_rot8 version seems to have 8% more collisions, for buffers of 4Ki words (8Ki bytes)." which, I understand, is worse than ADD.

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/07/2021 at 17:24

@Thomas yes it is.

The fine lines are that I have not covered all the cases, and did it in a particularly crude way so surprises might still hide in dark corners.

What is at stake though is one of the big assumptions of checksums and CRCs, that messages have random data, hence high/maximal entropy. Low entropy and high correlations are practical constraints that are less easily considered during the theoretical approach, numerical simulations and practical tests must always be conducted.

And I have only scratched the surface here. Yet this simple test, with 1/2h of runtime, was enough to convince me to ditch XOR.

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/07/2021 at 15:45

"the ADD Pisano is better than the latter?"

The culprit here is the compaction/combination from 33 bits down to 16, the trivial compactions will have some sort of collision or another. If you XOR the X and Y, you'll miss a certain class of errors, and if you ADD you'll miss another class... More complicated combinations with rotations or bitreverse will still miss 2^-16 cases anyway.

It seems that the ADD mixing is more efficient than XOR mixing, probably due to a better avalanche that does not interfere with the late combination. We could come up with more sophisticated compactions but this wouldn't hide the weakness of XOR mixing.

16-bit checksums or even CRC16 are not powerful anyway. They should be used only on short messages with low probability of errors, where any overhead is to be avoided. 32 bits are considerably better, even with a dumb algorithm.

Are you sure? yes | no

Thomas wrote 08/07/2021 at 16:07

Thanks for the eplanation :-)
16bit checksums used to be the go-to method 20 years ago (or even fewer bits, e.g. 8bit in 1-wire or Intel Hex or 15bit in CAN). In practical applications the error rate is important.

Are you sure? yes | no

Yann Guidon / YGDES wrote 08/07/2021 at 17:34

@Thomas yes, we fall in the "good enough" and "still better than nothing" darker areas of engineering ;-)

Which is why I chose to provide the 16-bit version anyway, since it costs almost nothing development-wise for me, and would have been requested anyway.

Furthermore, it could help identify pathological cases in the 32-bit version (and it did).

Now, with the properties of the 16x2 checksum becoming clearer, it is obvious for me that it should be used instead of the compacted version, whenever the overhead constraints allow it.

It is very easy to generate and spot flaws in 16-bit checkums, much less so for 32 bits but it's a first step in the right direction.

A more thorough face-off in the 32 bits realm is planned, probably using the small cluster of Raspberry Pis I try to assemble. If one Pi core at 1GHz can generate one miss every 4 second, then a quad-core RPi would generate one miss per second. The cluster would have 8 Pis, so that's 8 per second on average, but a precise curve will need tens of thousands of points, at least, so I expect hours or days of computations to compare PEAC with others.

Are you sure? yes | no

Cheap tricks for better results

Additions and consequences

Discussions

Become a Hackaday.io Member