The new "non-masking" algorithm exploits some little tricks (explained in 34. A good compromise) to provide arbitrarily long checksums while using "only" a generator running on 26 bits (with a period of 2^51, or 2.251.799.813.685.248). When using 32-bit registers, it provides a baseline checksum of 32 bits and the extended result can bring the full checksum to 64 bits.
This is appropriate for the (non-crypto) signature of files for example.
The input data is mixed in chunks of 16 bits only because the generator has only 26 bits. Mixing larger chunks would not make as much sense in corner cases, such as when reading long sequences of constant data.
The 2 new XOR gate increase the computation cost slightly but it's still far from a comparable CRC32 with or without tables for example. And the source code is inherently more portable than the earlier versions that rely on supporting the hardware carry signal.
// The original lovely but masking algo: C += X + Y; // ADC Y = X + M_; X = C & PISANO_MASK; C = C >> PISANO_WIDTH; // The new non-masking algo: Cu += Xu + Yu; // ADC p = Xu ^ Yu; Yu = Xu + M_; Xu = Cu; Cu = ((p ^ Cu) >> PISANO_WIDTH) & 1; // ROL 6
The "masking" algo is amazingly short and beautifully efficient but limits the number of useful/used bits to 26×2. This means that only 0.05% of the available coding space is used. Not wasting these bits is worth a few extra instructions, I think, unless a better/higher number than 26 is found.
The Xu = Cu line will be removed by more advanced, unrolled versions of the code where variable renaming optimisations are possible.