MUX trees

A project log for YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Yann Guidon / YGDESYann Guidon / YGDES 02/05/2018 at 15:470 Comments

At this moment I work on a more formal code for the MUX parts. In other words I'm digging again in a pet topology project. This makes the VHDL code better, because I realise I use MUX8 in various places yet I don't get the best out of them. For example, even though I built the Register Set out of balanced control trees, I didn't use this technique for the conditions. So I started writing MUX8 components in VHDL... I haven't uploaded the new code archive but when I do, look at MUX8.vhdl. I should also rewrite the REG8 module by using these enhanced MUX8.

The next step is the large MUX64 used by the serial debug system (see 24. Synchronous Serial Debugging). I'd like to design it algorithmically but I haven't cracked yet the algorithm. Is there a simple one ?

20180227 : algorithm cracking in progress. Meanwhile, I already have one topology/solution for MUX64 :

It's going to be fun to write this in VHDL...

20200808 :

I did some synthesis tests ! See also the further work at 110. The art of large MUXes

The almost-optimal version behaves very well, because the tree is easily understood as 4 sub-trees that are naturally mapped to different regions. Here is one random result, with the 4 sub-trees higlighted in turns :

The control signals are good too :

The control signals can be grouped close to where they are most needed. For low-fanout signals that go to other sub-trees, a buffer could eventually be inserted to go the extra disctance, because the added delay would be equivalent to the lower levels' latency.

For comparison : below is the wiring for a "classic" MUX64 where the lower-level control signal must drive 32 MUXes ! The fanout is so large that the net must be split into 2, one with 17 and the other with 16 muxes.

Do you agree with me that it's an utter mess ? There is no consideration for routing effort during the place step and you have identical values that cross in many places !

In some cases, the signal is input on the pin on the right, goes to the south-west quadrant then is buffered to the north-east quadrant. Good, good good good, good...

Of course this is an artefact of how the synthesis is integrated with the place&route tool and a symptom of a greater deficiency (the wires could be greatly reorganised by allowing some degrees of freedom in the description of the circuit) but at least my solution is a very significant step forward.

The other message is that it is BAD to have high fanout nets. ASICs limit to 4 usually and PROASIC3 to 16 gates but it's already a stretch...