Close

Improved ROP2

A project log for YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

yann-guidon-ygdesYann Guidon / YGDES 01/03/2019 at 23:235 Comments

In the log 5. YGREC in VHDL, ALU redesign   I show how the ROP2 unit shares gates with the adder.

The "Pass" datapath is quite annoying with the 3rd multiplexer so I moved it upstream, taking advantage of the 3-input gates.

The merged gate is now a type AX1 and saves a tiny bit of latency on the ROP2 critical datapath, as well as one gate. This is valid for the ProASIC3 as well as other FPGA, less so for discrete or MUX-based technologies (such as relays). This change is significant enough, however, to justify a redesign of the opcode map, following these constraints :

The new mapping is :

This means I have to redesign the ALU "a bit" but with more emphasis on place&route. The above new circuit is easy to process by hand. There are however a few details that change with the order of the bit, during comparison. From the previous version of the ALU8 code:

-- Initial XOR of the operands
XOR_DST <= (7=> negate_DST and not compare_signed, others => negate_DST);
XOR_SRC <= (7=>                    compare_signed, others =>        '0');
DSTX <= DST XOR XOR_DST;
SRCX <= SRC XOR XOR_SRC;

Bus names have changed since 2017, DST=>SND and SRC=>SRI. The code says that SRI(7) is XORed with the control signal "compare_signed", and SND(7) with its inverse. This adds an inconvenient corner case that I'd like to get rid of... It doesn't affect the critical datapath a lot but placement gets trickier.

I'll "tweak" that later but at least the SND input could be inverted by a XOR3 instead of XOR2, or the specific NEG input could get a special treatment.

Layout is pretty easy:

That's a good base for the ADD8 that connects to it (I didn't show the P, G and XOR outputs).

One nice thing with this kind of pre-routing is the opportunity to spot optimisations for later in ASIC. For example: there are MUXes driven by the same control signal so they can share a buffer and inverter with a direct neighbour.

Discussions

roelh wrote 01/06/2019 at 10:17 point

Yann, why don't you use a multiplexer for the logic unit, as proposed by Dieter:

6502.org/users/dieter/a1/a1_mux1.png

Cute, innocent box...  in your case, three 2-to-1 multiplexers.

I do not know your add/subtract circuit, but it can be combined with the multiplexer based logic unit, as follows:

<img src="http://enscope.nl/rrca/ideas/simple_alu.gif">

This circuit will do add, subtract and all possible logic functions. It is logically the same as the relay circuit that is described in this log: https://hackaday.io/project/160506-4-bit-ttl-alu/log/155600-a-relay-alu, and almost the same as #ALU in DCTL technology. 

The function codes are also in the above log.

Of course, the ripple carry that is used here can be replaced by a faster carry method.

  Are you sure? yes | no

roelh wrote 01/06/2019 at 10:26 point

Damn, how can I get an image in a comment ?

  Are you sure? yes | no

Marcel van Kervinck wrote 01/06/2019 at 12:23 point

ASCII art perhaps

٩(◕‿◕。)۶

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/07/2019 at 11:29 point

muahahahaha thanks Marcel :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 01/07/2019 at 12:35 point

The answer is :

I start with a working CLA (Carry Lookahead adder) and "reuse" the P and G gates. With some technologies (those I target, in particular discrete transistors and ASIC) the AND/OR way is preferred to MUXes.

There is also the need for decoding: the MUX ALU requires some decoding to translate the opcode into the appropriate control signals, which must then be amplified to drive all the MUXes. In my chosen method, few signals need boolean computations (only NEG), the others come directly from one bit of the instruction word and can be amplified right away. These signals are also required a few gates AFTER the computations, which give a bit of slack in the timing, unlike when you need the signals before the computation (this case requires a more aggressive layout).

DM's MUX-based ALU is great for relay-only, or SSI-based (74xx) implementation but it's not my target.

  Are you sure? yes | no