Close

Two outputs and more inputs

A project log for Libre Gates

A Libre VHDL framework for the static and dynamic analysis of mapped, gate-level circuits for FPGA and ASIC. Design For Test or nothing !

yann-guidon-ygdesYann Guidon / YGDES 06/08/2023 at 23:310 Comments

Digging through various .lib gathered from around the 'net, it appears that sequential cells (flip-flops, transparent latches and set-reset latches) are often described with the Q and /Q outputs simultaneously. There is some advantage to this :

So I'll have to find a way to include that in the code...

Furthermore, I often get annoyed by the redundancy that is often found inside cells, which makes it harder for the synthesiser to factor some signals, in particular inverters : you can find them in many gates such as XOR, MUX and even DFF. Often, these signals are driven by a common source, such as a clock net, a control signal, and spread to tens of gates (let's say N). So the common signal is sent to N gates with a fan-in of 2 because it must drive the gate and the inverter. Such a fanout creates delays.

Sending a pair of complementary signals lowers the fanout/fanin constraint, so it should be 2× faster (up to...). It also makes place&routing more complex, particularly if it's a dumb algo that doesn't care about the geometry...

But this makes the case for "grouped cells" with one clock or control input signal which is locally amplified to 2 or 4 neighbouring cells. I have seen this recently in a paper or something...

Again, this is an optimisation step, where energy, space and time are progressively enhanced by making some custom cells to solve some local inefficiencies.

-o-0-O-0-o-

Meanwhile I also see some cells with more than 4 inputs.

Currently the system uses up to 4-input LUTS, and quite a bit of obscure code depends on this. But I have seen some larger combinatorial gates such as MUX4, and the adder uses a macrogate containing AND2, AND3 and OR3 with 5 inputs. A LUT5 has 32 entries, while the individual gates total 4+8+8=20 entries, so testing them requires fewer test vectors than a custom macrogate. OTOH the macrogate saves interconnect/routing resources, since it knows how to keep some traces directly at the cell level, without going through the tortuous metal layers.

I guess a cell would need to be crazy complicated or incredibly more efficient to justify going to LUT6. Even LUT5 would require some rewrites here and there. I also see some cells FA and HA : Full Adder and Half Adder, with 2 or 3 inputs and 2 outputs. This last case is a bit easier to handle since each "macrogate" can be "splitl" into two 1-output gates :

Anyway, with @alcim.dev  the exploration and design of libraries of gates is progressing.

Discussions