Close

Pushing more bubbles, now the carry-lookahead adder

A project log for YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Yann Guidon / YGDESYann Guidon / YGDES 01/01/2020 at 17:480 Comments

The ROP2 re-engineering is going well, despite the surprising number of gates, but now it looks ready for an efficient ASIC implementation. So here comes the time for the ASIC-ification of the CLA...

There is a 5-gates macroblock that appears 6 times in the circuit and it was already NANDified. See gate_CLA3.vhdl. As noted in the log Netlist and structure of the adder, we need to perform the function Y <= (A AND B  AND C) OR (A AND D) OR E; several times and these need to be optimised as a whole. The following circuit:

becomes:

but there is a remaining inverter...

Similarly, there are a few AND3 that would benefit from a switch to NOR3 if only the input was inverted.

And the AO1 gate ((A and B) or C) can also be turned into a pair of NAND2, if the C input is inverted...

Do you see where I'm going ?


So now, I'm adding a couple of new output ports to the ROP2 units that provide the negated version of P & G. The CLA circuit can then select the negated or positive version, which also decreases the fanout :-)


The ROP2 part has a pretty decent structure, despite the size.

Fanout:  Count:  .........|.........|.........|.........|.........|
    1 :     48 - ************************************************  
    2 :     32 - ********************************
    3 :     24 - ************************
    4 :      0 -
    5 :      0 -
    6 :      0 -
    7 :      0 -
    8 :      5 - *****
  
 Depth:  Gates:  .........|.........|.........|.........|.........|
    0 :     21 - *********************
    1 :     16 - ****************
    2 :     16 - ****************
    3 :     16 - ****************
    4 :     16 - ****************
    5 :     16 - ****************
    6 :      8 - ********

It is quite easy to layout, with 2 gates "width" per bit.

OTOH the carry lookahead is not as nicely regular and nicely behaved:

Latency of the 9 outputs :
    Output#0 : 2
    Output#1 : 3
    Output#2 : 4
    Output#3 : 5
    Output#4 : 6
    Output#5 : 7
    Output#6 : 5
    Output#7 : 7
    Output#8 : 8

************ END OF DEPTHLIST ************

Fanout:  Count:  .........|.........|.........|.........|.........|
    1 :     50 - **************************************************
    2 :     13 - *************
    3 :      5 - *****
    4 :      0 -
    5 :      1 - *
  
 Depth:  Gates:  .........|.........|.........|.........|.........|
    0 :     35 - ***********************************
    1 :     12 - ************
    2 :      5 - *****
    3 :      4 - ****
    4 :      5 - *****
    5 :      4 - ****
    6 :      3 - ***
    7 :      1 - *  <- the carry out

It is clearly not optimal but I wanted to keep the number of gates low. Only 35 this far :

Several places have their polarity (and gate type) swapped for an inverted version. For example: the signal G2(1) has been negated, so it drives cla2e and cla2f with their inverted input. cla2b is therefore totally swapped as well, made with NOR gates.

AO1 is usually replaced by AO1B so it can be implemented as a pair of NAND2.

The combined unit uses 123 gates !

Latency of the 17 outputs :
(the ROP2:)
    Output#0 : 7
    Output#1 : 7
    Output#2 : 7
    Output#3 : 7
    Output#4 : 7
    Output#5 : 7
    Output#6 : 7
    Output#7 : 7
(CLA8:)
    Output#8 : 6
    Output#9 : 6
    Output#10 : 7
    Output#11 : 8
    Output#12 : 9
    Output#13 : 10
    Output#14 : 7
    Output#15 : 10
    Output#16 : 11
  
Fanout:  Count:  .........|.........|.........|.........|.........|
    1 :     79 - **************************************************
    2 :     31 - ********************
    3 :     25 - ****************
    4 :      4 - ***
    5 :      2 - **
    6 :      0 -
    7 :      0 -
    8 :      5 - ****
 Depth:  Gates:  .........|.........|.........|.........|.........|
    0 :     23 - ***********************
    1 :     16 - ****************
    2 :     16 - ****************
    3 :     20 - ********************
    4 :     25 - *************************
    5 :     22 - **********************
    6 :     12 - ************
    7 :      4 - ****
    8 :      4 - ****
    9 :      3 - ***
   10 :      1 - *

The circuit uses these gates :
and2a ao1 ao1b inv nand2 nand3 nor2 nor3 or2 xor2
There seems to be room for 2 stages of OR between ROP2 and CLA8 output, to multiplex other values (from IN port and SHA/ROT)


Now I want to replace the output MUX and merge it with the XOR at the end of the CLA....

Discussions