v4 is slowly taking shape, trying to replicate the success of sharing the addition between two consecutive words : as for the indices A & B, R and FG are treated in one single 32-bit operation, where C is only 17 bits wide, so there is little chance of carry-out.
The initial carry-in comes from (S&1) and it's bad. The little LFSR8 gets squeezed to extract 14 bits, including the LSB that is reused 3 times, which is not good® !
Can you spot the other changes ?
uint16_t Scramble_round(uint16_t D) {
uint8_t S=LFSR8;
LFSR8 >>= 1;
if (S & 1)
LFSR8 ^= 0x8e;
uint32_t C = D + FG + (S&1);
uint8_t A = C ^ S;
uint8_t B = C >> 8;
uint16_t X = LUT16[A];
uint16_t Y = LUT16[B];
uint32_t Z = X | (Y << 16);
Z = (Z >> (S&31)) | (Z << ((-S)& 31));
uint32_t R = Z + C; // 32+17 bits
FG = (uint16_t)(R >> 16);
LUT16[A] = (uint16_t)Z;
LUT16[B] = Z >> 16;
return (uint16_t)R;
}
The code looks shorter and simpler, which is not necessarily better, but faster is good, and this is possible with fewer byte-wide operations.
I have removed the once-defining byte-dephasing structure. In fact the rotation already does quite a lot, but now the result R is only "protected" from a whole 16-bit word written back to the LUT, by a (not so simple) addition.
I have an idea to apply a non-linear boolean operation that preserves the number of bits. This would "swap" certain bits from 2 parts of the Z word:
diff = X & ~Y;
Y |= diff;
X &= ~diff;
- On the LUT datapath, before writeback, it relies on A!=B, which would introduce an Enigma-like flaw 0.4% of the time.
- On the FG / R output path, it would make an inappropriate imbalance of bit values.
Well, it's a start...
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.