Standardized control unit and microcode layout

HISTORY

Complex digital circuits can be described in different ways for the purpose of (re) creating them in FPGAs. One way that was curiously absent is the practice of microcoding. Looking at the history of computing in the last 70 years, this approach has been very popular for all sorts of devices from custom controllers to CPUs. This article describes the history of microcoding and its applications very well:

https://people.cs.clemson.edu/~mark/uprog.html

Coming to the era of particular interest to retrocomputing hobbyists (60, 70ies and 80ies), microcoding was extremely widespread technique. Most minis and mainframes of the era used it,for example PDP-11:

https://ia801908.us.archive.org/12/items/bitsavers_decpdp1111codeListingApr81_5149506/EY-C3012-RB-001_Microcode_Listing_Apr81.pdf

When the microprocessor revolution started, some of the early 8-bit CPUs were using "random logic" to implement their control unit (6502, Z80, 1802), but in order to build something more flexible and faster, microcoding was the only game in town. One could almost say that the microcoding was the standard "programmable logic" way of the day, just as today FPGAs are.

One company in particular made fame and fortune using microcoding: AMD. The Am29xx family of devices was the way to create custom CPUs and controllers, or re-create minis from previous era and shrink them from small cabinet to a single PCB. Alternatively, well-known CPUs could be recreated but much faster. For example:

https://en.wikipedia.org/wiki/Microcodehttps://en.wikichip.org/w/images/7/76/An_Emulation_of_the_Am9080A.pdf

(note: based on the well documented design above, I coded it in VHDL and got 8080 monitor to run, see link in main project page)

Once the complexity of single - chip CPUs rose, microcoding again gained prominence, and is present from the first iterations of 68k and 8086 processor families until now (for example, description of 68k microcode: https://sci-hub.st/https://doi.org/10.1145/1014198.804299 )

HELPFUL ANALOGY

The problem is, so many variations of microcoding design obfuscate the beautiful simplicity of it all, which essentially boils down to:

That's right:

- the circumference of the cylinder is the depth of the microcode memory - the bigger it is the more complex the tune / instruction set. However it is always limited and hard-coded (unless one replaces the cyclinder, which is also possible in microcoding)

- the length of the cylinder determines the complexity of the design - more "notes" can be played at the same time (inherent parallelism)

- turning the crank faster is equivalent to increasing the execution frequency of the microinstruction, up to the point where the vibrating metal cannot return to the neutral position to play the right tune any more (meaning that the cycle is faster than the latency paths in the system)

The only missing part in the picture above would be the ability to disengage the cylinder, rotate to a specific start position ("entry point of instruction execution"), then engage and play to some other rotation point for a complete analogy.

DESIGN FOR SIMPLICITY

To capture the simplicity, I opted for a parametric design design pattern where the structure is always the same but its characteristics can be varied widely using parameters U, V, W, S, C. These parameters are given as microcode compiler statements. Let's look at the those:

.code U, W ..

.mapper V, U ...

.controller S

. if C ...

.then U

.else U

This will generate:

mapper memory with V address lines (2^V words) and width U
code memory with U address lines (2^U words == circumference of cylinder above) and width W (length of cylinder above)
microprogram controller with S microprogram counters ("stack"), which can:
- select from 2^C conditions
- branch to U - 4 locations in the code memory
- execute following 4 special instructions: next, repeat, return, fork

Here is a schematic representation rendered using highly sophisticated state of the art tools:

The constraints of parameters are:

Given that 2U + C bits will be consumed by the microprogram controller, that means W > (2U + C) to leave at least some useful control bits to drive the design
The mapper memory address is usually directly connected to the output of "instruction register" - which means that V <= [instruction register width].
Meaningful U is >= 4 (yes, 16 micro-instructions are sufficient for some simple designs)
Meaningful C is >= 2 (4 conditions, true, false and two additional ones)
V can be 0 (some designs don't need any mapper)

Let's look at two set of these parameters in practice:

1802 CPU (microcode):

.code 8, 64, cdp180x_code.mif, cdp180x_code.cgf, cdp180x_code.coe, cpu:cdp180x_code.vhd, cdp180x_code.hex, cdp180x_code.bin, 8;
.mapper 9, 8, cdp180x_map.mif, cdp180x_map.cgf, cdp180x_map.coe, cpu:cdp180x_map.vhd, cdp180x_map.hex, cdp180x_map.bin, 1;
.controller cpu_control_unit.vhd, 8;

microcode memory of 256 (2^U) words, 64 (W) bits each
mapper memory of 512 (2^V) words, 8 (U) bits each
controller with 8 levels deep stack (S)

The controller is driven by following description of if (cond) then / else):

seq_cond:        .if 4 values 
                true,            // hard-code to 1
                mode_1805,       // external signal enabling 1805/1806 instructions
                sync,            // to sync with regular machine cycle when exiting tracing routine
                cond_3X,         // driven by 8 input mux connected to ir(2 downto 0), and ir(3) is xor
                cond_4,          // not used
                cond_5,          // not used
                continue,        // not (DMA_IN or DMA_OUT or INT)
                continue_sw,     // same as above, but also signal to use switch mux in else clause
                cond_8,          // not used
                externalInt,     // for BXI (force false in 1802 mode)
                counterInt,      // for BCI (force false in 1802 mode)
                alu16_zero,      // 16-bit ALU output (used in DBNZ only)
                cond_CX,         // driven by 8 input mux connected to ir(2 downto 0), and ir(3) is xor
                traceEnabled,    // high to trace each instruction
                traceReady,      // high if tracer has processed the trace character
                false            // hard-code to 0
                default true;
seq_then:    .then 8 values next, repeat, return, fork, @ default next;                // any label
seq_else:    .else 8 values next, repeat, return, fork, 0x00..0xFF, @ default next;    // any value as it can be a trace char

It can be seen that 20 (C + U + U = 4 + 8 + 8) bits from 64 will be used by the controller, leaving 44 bits to drive the rest of the CPU logic.

These 44 bits are comprised of fields, each field has name, width and set of allowed / disallowed values. There are 2 types of fields:

"registered" fields which are assumed to cause state to be captured at the end of the microinstruction cycle
"value" fields which are assumed to directly drive some control signal during this microinstruction cycle.

Good illustration of this is controlling the 16*16 register file. The address value is a "value field" which selects where the address is coming from but does not need to persist, the new value of the register needs to persist based on the "regfield" selection:

// 16 * 16 register file
sel_reg:    .valfield 3 values zero, one, two, x, n, p default zero;        // select source of R0-R15 address
reg_r        .regfield 3 values same, zero, r_plus_one, r_minus_one, yhi_rlo, rhi_ylo, b_t, -  default same;

Based on the above, it is clear that:

sel_reg = two, reg_r <= zero ... R(2) <= 0

sel_reg = p, reg_r <= r_plus_one ... R(P) <= R(P) + 1

sel_reg = n, reg_r <= yhi_rlo ... R(N).1 <= Y, R(N).0 <= R(N).0

etc.

However:

reg_r <= same, sel_reg <= <any of 8 values> ... NOP

The above instruction never has to be written by the programmer, that is the purpose of "default" - it will be assumed by the compiler, meaning that if the design is implemented properly, the register will be unaffected.

TTY to VGA controller (microcode):

        .code 6, 32, tty_screen_code.mif, tty_screen_code.cgf, tty:tty_screen_code.vhd, tty_screen_code.hex, tty_screen_code.bin, 4;
        .mapper 7, 6, tty_screen_map.mif, tty_screen_map.cgf, tty:tty_screen_map.vhd, tty_screen_map.hex, tty_screen_map.bin, 1;
        .controller tty_control_unit.vhd, 4;

...

seq_cond:    .if 3 values 
            true,             // hard-code to 1
            char_is_zero,
            cursorx_ge_maxcol,
            cursory_ge_maxrow,
            cursorx_is_zero,
            cursory_is_zero,
            memory_ready,
            false            // hard-code to 0
            default true;
seq_then:    .then 6 values next, repeat, return, fork, @ default next;                // any label
seq_else:    .else 6 values next, repeat, return, fork, 0x00..0x3F, @ default next;    // any value as it can be a trace char

...

microcode memory of 64 (2^U) words, 32 (W) bits wide
mapper memory of 128 (2^V) words (maps to 7 bit ASCII), 6 (U) bits wide
microcode controller with stack depth of 4 (S), consuming 15 (C + U + U = 3 + 6 + 6) bits out of 32, leaving 17 bits to drive the rest of the circuit

Proof of concept - TTY to VGA

Debugging microcoded designs

Discussions

Become a Hackaday.io Member