Complex digital circuits can be described in different ways for the purpose of (re) creating them in FPGAs. One way that was curiously absent is the practice of microcoding. Looking at the history of computing in the last 70 years, this approach has been very popular for all sorts of devices from custom controllers to CPUs. This article describes the history of microcoding and its applications very well:
Coming to the era of particular interest to retrocomputing hobbyists (60, 70ies and 80ies), microcoding was extremely widespread technique. Most minis and mainframes of the era used it,for example PDP-11:
When the microprocessor revolution started, some of the early 8-bit CPUs were using "random logic" to implement their control unit (6502, Z80, 1802), but in order to build something more flexible and faster, microcoding was the only game in town. One could almost say that the microcoding was the standard "programmable logic" way of the day, just as today FPGAs are.
One company in particular made fame and fortune using microcoding: AMD. The Am29xx family of devices was the way to create custom CPUs and controllers, or re-create minis from previous era and shrink them from small cabinet to a single PCB. Alternatively, well-known CPUs could be recreated but much faster. For example:
(note: based on the well documented design above, I coded it in VHDL and got 8080 monitor to run, see link in main project page)
Once the complexity of single - chip CPUs rose, microcoding again gained prominence, and is present from the first iterations of 68k and 8086 processor families until now (for example, description of 68k microcode: https://sci-hub.st/https://doi.org/10.1145/1014198.804299 )
The problem is, so many variations of microcoding design obfuscate the beautiful simplicity of it all, which essentially boils down to:
- the circumference of the cylinder is the depth of the microcode memory - the bigger it is the more complex the tune / instruction set. However it is always limited and hard-coded (unless one replaces the cyclinder, which is also possible in microcoding)
- the length of the cylinder determines the complexity of the design - more "notes" can be played at the same time (inherent parallelism)
- turning the crank faster is equivalent to increasing the execution frequency of the microinstruction, up to the point where the vibrating metal cannot return to the neutral position to play the right tune any more (meaning that the cycle is faster than the latency paths in the system)
The only missing part in the picture above would be the ability to disengage the cylinder, rotate to a specific start position ("entry point of instruction execution"), then engage and play to some other rotation point for a complete analogy.
DESIGN FOR SIMPLICITY
To capture the simplicity, I opted for a parametric design design pattern where the structure is always the same but its characteristics can be varied widely using parameters U, V, W, S, C. These parameters are given as microcode compiler statements. Let's look at the those:
.code U, W ..
.mapper V, U ...
. if C ...
This will generate:
- mapper memory with V address lines (2^V words) and width U
- code memory with U address lines (2^U words == circumference of cylinder above) and width W (length of cylinder above)
- microprogram controller with S microprogram counters ("stack"), which can:
- select from 2^C conditions
- branch to U - 4 locations in the code memory
- execute following 4 special instructions: next, repeat, return, fork
Here is a schematic representation rendered using highly sophisticated state of the art tools:
The constraints of parameters are:
- Given that 2U + C bits will be consumed by the microprogram controller, that means W > (2U + C) to leave at least some useful control bits to drive the design
- The mapper memory address is usually directly connected to the output of "instruction register" - which means that V <= [instruction register width].
- Meaningful U is >= 4 (yes, 16 micro-instructions are sufficient for some simple designs)
- Meaningful C is >= 2 (4 conditions, true, false and two additional ones)
- V can be 0 (some designs don't need any mapper)
Let's look at two set of these parameters in practice:
1802 CPU (microcode):
.code 8, 64, cdp180x_code.mif, cdp180x_code.cgf, cdp180x_code.coe, cpu:cdp180x_code.vhd, cdp180x_code.hex, cdp180x_code.bin, 8; .mapper 9, 8, cdp180x_map.mif, cdp180x_map.cgf, cdp180x_map.coe, cpu:cdp180x_map.vhd, cdp180x_map.hex, cdp180x_map.bin, 1; .controller cpu_control_unit.vhd, 8;
- microcode memory of 256 (2^U) words, 64 (W) bits each
- mapper memory of 512 (2^V) words, 8 (U) bits each
- controller with 8 levels deep stack (S)
The controller is driven by following description of if (cond) then / else):
seq_cond: .if 4 values true, // hard-code to 1 mode_1805, // external signal enabling 1805/1806 instructions sync, // to sync with regular machine cycle when exiting tracing routine cond_3X, // driven by 8 input mux connected to ir(2 downto 0), and ir(3) is xor cond_4, // not used cond_5, // not used continue, // not (DMA_IN or DMA_OUT or INT) continue_sw, // same as above, but also signal to use switch mux in else clause cond_8, // not used externalInt, // for BXI (force false in 1802 mode) counterInt, // for BCI (force false in 1802 mode) alu16_zero, // 16-bit ALU output (used in DBNZ only) cond_CX, // driven by 8 input mux connected to ir(2 downto 0), and ir(3) is xor traceEnabled, // high to trace each instruction traceReady, // high if tracer has processed the trace character false // hard-code to 0 default true; seq_then: .then 8 values next, repeat, return, fork, @ default next; // any label seq_else: .else 8 values next, repeat, return, fork, 0x00..0xFF, @ default next; // any value as it can be a trace char
It can be seen that 20 (C + U + U = 4 + 8 + 8) bits from 64 will be used by the controller, leaving 44 bits to drive the rest of the CPU logic.
These 44 bits are comprised of fields, each field has name, width and set of allowed / disallowed values. There are 2 types of fields:
- "registered" fields which are assumed to cause state to be captured at the end of the microinstruction cycle
- "value" fields which are assumed to directly drive some control signal during this microinstruction cycle.
Good illustration of this is controlling the 16*16 register file. The address value is a "value field" which selects where the address is coming from but does not need to persist, the new value of the register needs to persist based on the "regfield" selection:
// 16 * 16 register file sel_reg: .valfield 3 values zero, one, two, x, n, p default zero; // select source of R0-R15 address reg_r .regfield 3 values same, zero, r_plus_one, r_minus_one, yhi_rlo, rhi_ylo, b_t, - default same;
Based on the above, it is clear that:
sel_reg = two, reg_r <= zero ... R(2) <= 0
sel_reg = p, reg_r <= r_plus_one ... R(P) <= R(P) + 1
sel_reg = n, reg_r <= yhi_rlo ... R(N).1 <= Y, R(N).0 <= R(N).0
reg_r <= same, sel_reg <= <any of 8 values> ... NOP
The above instruction never has to be written by the programmer, that is the purpose of "default" - it will be assumed by the compiler, meaning that if the design is implemented properly, the register will be unaffected.
TTY to VGA controller (microcode):
.code 6, 32, tty_screen_code.mif, tty_screen_code.cgf, tty:tty_screen_code.vhd, tty_screen_code.hex, tty_screen_code.bin, 4; .mapper 7, 6, tty_screen_map.mif, tty_screen_map.cgf, tty:tty_screen_map.vhd, tty_screen_map.hex, tty_screen_map.bin, 1; .controller tty_control_unit.vhd, 4; ... seq_cond: .if 3 values true, // hard-code to 1 char_is_zero, cursorx_ge_maxcol, cursory_ge_maxrow, cursorx_is_zero, cursory_is_zero, memory_ready, false // hard-code to 0 default true; seq_then: .then 6 values next, repeat, return, fork, @ default next; // any label seq_else: .else 6 values next, repeat, return, fork, 0x00..0x3F, @ default next; // any value as it can be a trace char ...
- microcode memory of 64 (2^U) words, 32 (W) bits wide
- mapper memory of 128 (2^V) words (maps to 7 bit ASCII), 6 (U) bits wide
- microcode controller with stack depth of 4 (S), consuming 15 (C + U + U = 3 + 6 + 6) bits out of 32, leaving 17 bits to drive the rest of the circuit