Leaks.
v2 still leaks internal states.
Well, data must come from somewhere anyway, right ?
And when there is no external entropy source, we're left with this lousy LFSR8 to run the show in the worst case.
Anyway, this system is not encryption because there is no key, just an evolving internal state.
Now, as a standalone PRNG, it's getting better but the internal state could still be somehow deduced. The original intent is for the algo to be a scrambler so it expects "some sort of entropy" at the input, amplifying it along the way with a huge RC4-like talble. So it's "just a filter" that isolates one PRNG from the outside. Yet I still need an internal 8-bit "entropy source"...
It appears that the link with cryptography is strong though, as trying to make this system "more random-like" means trying to "break it" like a cryptographer would do. So here I am, bikeshedding it to oblivion, not even caring what the result will be or what it would lead to.
So my concern is : the result is a direct reading of the table, merging 2 independent half-reads. It's fine as long as the table is considered "unkown and unknowable". But considering that it takes several hundreds of cycles for the table to be well shuffles/scrambled, it's also as many values that can be directly extracted and correlated, hence the output could be one step closer to be predicted.
So the output should not be a direct read of the LUT. It's sad since it's good to have an algorithm with very low latency, as it cascades well.
- One way forward is to extract the 16-bit result/output from the 32-bit output of the barrel shifter. But this only moves the problem somewhere else since this value will be written to the LUT as well.
- Another method would be to XOR the output of the B LUT, since the A LUT is xored at the address input. Recovery from 0-entropy would be faster but the LUT would stop being a "bit permutation" area, which would upset the strict balance of 1s and 0s throughout the table, and also require an even larger 16-bit LSFR ...
- Of course the algorithm could be run several times before delivering its final results. Loop over the LUT 2, 3, 4 times and XOR the results together. XOR or ADD is possible because it does not have to obey to the bit count preservation rule.
...
Overall the whole algo (like RC4) is mostly "diffusion" and there is little "confusion" happening, except for the address calculations. The barrell shifter is a small-scale diffusor while the LUT is a large-scale diffusor. Confusion is meant to happen in the LUT thanks to a well-brewed table, but only from input to output.
Moreover, there are 32 bits to scramble and the barrell shifter only uses 32 combinations out of the factorial(32) ways to shuffle the bits around. We got a 8-bit LFSR and even then only 5 bits are used.
So far the current solution is to take 16 bits out of the barrel shifter and add the pair of addresses (before the XOR).

There is one more adder yet the diagram looks simpler.
R is a function of the current and last pair of lookups. Why an addition ? It could be a XOR but I'd like more avalanche.
No data enters the LUT, which is still a pure permutation of bits. A better shuffling would be welcome.
Source is there: ATBSS.v3.tgz
Mixing (worst case) is not complete after 2k iterations but ok at 4k.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.