My efforts with Pushing more bubbles, now the carry-lookahead adder were promising but a bug somewhere has made it vain. So I restarted from scratch instead of trying to dig too deep in my own code. That's how I came with the results of Bitslice and now, I have a big advantage : I can have arbitrary polarity of the input and output of the CLA logic and I have more freedom to choose the gates.
Also this time I should make more progressive alterations to the design to catch errors earlier. Like : I should test it bit by bit and build the exhaustive test at the same time :-)
All the G and P inputs benefit from the inverter (which is in fact the output of the NOR2 or NAND2 of the ROP2 bitslice) and they all have a fan-in of 1 so the NOR2 and NAND2 only need a fanout of 2.
The other signals have a fan-in of 2, 2, 3, 1, 2, 0, which is reasonable. Cin has a serious fanout but is ready much earlier so it's not critical.
For P2(0) I replaced the AND3 with a NOR3. This provides the signal earlier than G2(0) because the inverted inputs arrive one inverter earlier (and there is only one driver layer).
The second block of the CLA is quite similar (which is not surprising since it is more or less copied from the LSB part).
The same recipe is applied. The AND-OR is replaced by NAND-NAND, AND3 is replaced by NOR3.
I had to insert two inverters, one on G2(1) because it's used on the inverting input of 2 blocks, so I renamed the signal /G2(1).
All the inverted input signals are used, and only once, just like before.
The last bits and the carry output are pretty similar:
Note how each G and P input is used, only once for each polarity.
The gates on the left have some freedom for re-interpretation.
The carry output is XORed but there is no penalty because the other outputs are XORed as well but in the ROP2 and SUXEN level.
The new source is there : https://cdn.hackaday.io/files/272801167147520/CLA8_NAND.cjs