This log continues "Registers"
The simplest board of the this project is P3A which contains 4 registers.
- 1 input data bus (the DST/result bus)
- 4 16-bits data latches
- 2 read outputs (SI4 and SND)
The initial plan was very simple: use a bunch of 74HC574. Fanout, fanin and routing created a bunch of problems though.
This approach was abandoned because I decided to move one distinctive functionality into the register set: the discrete YASEP can (and will) perform the register post-update directly at the register level (instead of going through the execution pipeline once again). This simplifies the sequencing of instructions execution: the #microYasep doesn't do post-update, the miniyasep adds cycles (so it's a variable-length pipeline) but here, there is (almost) no time overhead and instructions are executed in two clock cycles, just like the microyasep (but better !).
As a consequence, these are not "registers" anymore but counters. The '574 are replaced by 74HC193, which are preloadable up/down 4-bits cascadable counters. Instead of 4×74HC574 (2 pairs containing duplicated data), there are 4 74HC193 and the load on the result bus is cut in half (this can be further reduced by adding a buffer on the bus but it adds some latency).
Note: not all "counters" are identical:
- for R1-R5, post-increments are +1/-1 (we're talking about them, now)
- For PC, it's the same mechanism but the increment is fixed to +2, the address bits are shifted by one position. The counter is advanced every half-cycle (when it's a long instruction), no post-update is possible. Writing '1' to the LSB, or executing past FFFEh (thus setting the hidden high bit of the PC) will trigger a fault.
- For D1-D5, these "registers" directly map to the memory ports so no post-update possible. The corresponding Areg is post-updated instead.
- For A1-A5, the increments depend on the size of the addressed data. It's normally +2/-2 when D1-D5 is referenced, +1/-1 when A1-A5 are directly referenced, or when IB/EZB/ESB are used on Aregs. Managing the LSB will be tricky, I'm not sure yet how I'll do it. But overflows will also trigger a fault.
Yes, this architecture is not really orthogonal, but the instruction set is simplified a lot!
The advantage of lighter bus load is reduced by the more complex control signals. The '193 is a synchronous counter (with asynchronous reset) but the external data is loaded asynchronously (like a 573, unlike the preferred 574). Fortunately, preload (writeback to the register/counter) seems to have precedence over counting and no special logic is required to avoid conflict, if the control signals are correctly sequenced. However, counting up or down introduces "some delay" as the carry/borrow signals propagate from the LSB to the MSB. Thus, unlike "normal registers", the update of the counters is relatively slow, to overshadow the internal ripple propagation (preload seems to inhibit counting).
This also completely changes the way the conditions are handled. In FPGA/single chip implementations, the conditions are "shadowed"/cached from the result bus. Each register change is captured on its way to the register set and a copy of the condition is kept at a convenient location: the LSB, the MSB and the zero flag.
In this implementation, though, the flags are extracted from the SND bus during the 2nd read cycle. The zero condition can be recomputed, which is great when the condition comes from a D register (data coming from memory, which can't be cached because it doesn't go through the result bus).
The other problem is the two read buses. It is solved by the many 74HC253 chips: a dual MUX4 with tristate output (so the outputs can also be multiplexed on the read bus from the 3 other boards). As noted in a previous log, this reduces the load on the read buses, which could run a it faster.
Mechanically, there are as many '193 as '253. Each '253 multiplexes 2 bits from 8 inputs so 2×'253 are required to multiplex a bit. But there are two read buses so the number of '253 is doubled. The P3B board contains 16×'193 and 16×253.
Routing is still delicate but there is a hierarchical solution that is suggested by the counters: work with nibbles. One block of 4×'193 and 4×253 can be routed then replicated, the overall net structure looks like a two-levels 16×16 crossbar.
Note: this layout sketch is not to scale because the '253 are SOIC (1.27mm pitch) and the '193 are SSOP (.635mm pitch). That's why I run wires between the '253's legs on the left and nothing of that kind on the counters (on the right).
Green lines are vertical (on one side of the PCB) and blue are horizontal (on the other side). To keep routing simple, the 253 are on the opposite side of the circuit though I have not mirrored the pinout yet (DIA does not allow miroring or rotations...)
Overall, it looks easily routable :-) Duplicate this 4 times and then connect all the missing lines...