The register set is really the central, critical part of the core, it's the nexus and a physical representation of many logical structures. This explains why I focus so much on this apparently innocuous unit...
The register map is : D1 A1 D2 A2 R1 R2 R3 PC
- R1, R2 and R3 are "normal registers", implemented the old good way. Nothing to add here.
- PC is "a bit different", since the input has a multiplexer and the output has a direct bypass path to the memory. Oh and it is not an actual latch but an incrementer. Some special cases must be considered...
- A1 and A2 are almost "normal" : they havean extra "read" port/path that goes to the memory's address bus, might be through a MUX.
- D1 and D2 are... quite a mess. Their input is multiplexed because the value can be written from RAM. And it also depends on the configuration of the RAM array : dual port or single port ?
I will now consider the specific case of the VHDL implementation that targets the A3P FPGA family. Reasons include the ample availability of the chips and the ease of VHDL simulation, so I would get working results faster.
The interesting part is the RAM blocks that feature some interesting output latch modes. Only the address ports need to be externally latched and this saves some gates for the data registers. The following drawing summarizes the whole idea :
The wires are usually understood as bytes but here we'll think of them as individual bits because we'll design a first bitslice and replicate it (9 times, including parity).
The bitslice for this part has the following interface signals :
- of course : SRC and SND addresses (3 bits each)
- INC output / PC input
- SwapSelect (2 signals)
- Write enable for A1, A2, R1, R2, R3, PC
- of course : SND and SRC
- PC input
- PC output
- WriteData (post-swap)
it becomes apparent that the register set's structure contains more than latches and MUXes : some signal conditioning is performed in place, in particular the "swap" of PC. The address MUXes could even be performed in place. The corresponding VHDL code is easy to write from there.
An FPGA has different constraints than other technologies and it's a good first step toward full ASIC implementation, particularly with the ProASIC3 family (Actel/Microsemi's A3P, now more than 10 years old but still pretty good for many purposes). So I have tried to make a preliminary layout of one bitslice of the register set (called R7):
The circuit is dominated by MUX2s. There are only 6 DFF but more than 20 MUX2, and soon even more because it's easy to extend the datapath from here. There are also a lot of vertical control lines.
The INC unit has already been designed, and must be routed/laid to form a vertical column of minimal width.
The lower layer (not shown) will contain many decoders to drive the columns : MUX2 controls, DFF enable signals...
The program memory is at the bottom and the dual-ported Data RAM at the left. ALU and SHL at the right, and IN/OUT ports at the top:
For a FPGA, parity is not necessary at first. However a big missing piece is the debug system, mainly sitting between the program memory and the decoder. A line of MUX might also be sitting between the decoder and R7 to catch the Result, SND and SI8 busses.
Luckily, the PC's incrementer uses quite few gates and can fit in a column of only 1 gate wide :
The schematic shows the whole circuit with 13 3-inputs gates only. They can be paired straight-forwardly. I'll have to update the VHDL code.
Another full column of MUX2 selects the result bus' source : ALU, SHL or PORT_IN.
Yet another column of MUX2 selects the source between : register, R/I3 or R/I8.
The whole set is almost square : 15 tiles wide and 16 tiles high, or 240 tiles. Add the ALU and SHL plus some more decoding, that might not reach 500 tiles. This is 1/3 of the A3P060, which leaves quite a lot of room for I/O :-D