The 8-bit adder has been already covered in the early stages of the design and I found a satisfying diagram/circuit which, though not perfectly perfect, is 95% or so. And it works.
As already mentioned in an early log, I targeted the structure around the Actel ProASIC3 family that uses 3-input tiles. This makes it practical to build Carry Lookahead blocks/units with 3 "bits" of input, and since 3×3=9, it perfectly fits the width of the required 8 bits of data plus the Carry Out flag:
The adder gets the P, G and X inputs from the Logic unit (which is tightly bound). The first bits look like this :
We already notice a recurring circuit made of OR3,AND2,AND3:
This sub-circuit will be a good target for optimisation at ASIC level for example.
Chaining the first CLA3 with the second one gives the following diagram:
The logic depth is already 6 gates, not even counting the layer for P and G in the logic unit. There are already 4 "A2A3O3" circuits.
Adding the last bit and the carry out requires some care but doesn't increase the logic depth if some parts of the last A2A3O3 are moved closer to the inputs.
The fan-in and fanout have been carefully balanced, the maximum is Cin with a fan-in of 5 in the critical datapath. This ensure a reasonable speed for ASIC and FPGA.
There are 25 "tiles" (an average of 3 gates per bit to compute the lookahead) plus 9 XOR, so we must allocate at least 4 tiles per bit for the place&route.
5 instances of A2A3O3 can be pre-routed, leaving 7 other gates to place. These 5 instances can be replaced by the following combo for ASIC: