Close

Micro-coded controller deep-dive

A project log for Custom circuit testing using Intel HEX files

Download / upload memory contents into computer motherboards or other devices for test or debugging (using 3 micro-coded controllers)

zpekiczpekic 12/22/2021 at 20:350 Comments

In this project, instead of using a standard embedded processor and programming it to execute 3 tasks at hand (parsing HEX character stream, generating HEX character stream, and writing character stream to video RAM driving VGA), I created 3 independent micro-coded controllers, each tailored per task, and which can all operate in parallel. They can also be taken out of this project and dropped to any other needing that functionality. 

Creating such controllers is possible in standardized way using mcc - microcode compiler. The simplest of these is the tty_screen which was up and running in one afternoon. Here are the suggested steps.


(before digging in, reading this log could be useful to explain some microcoding basics and how they are leveraged in my standardized / parametric approach)

Define a high-level / draft design

It should be obvious that this is a custom memory access circuit, where the memory address is given by cursor X and Y positions which can go from 0..79 and 0..59 (for 640*480 VGA with 8*8 pixel font, actual VRAM address is A = 64Y + 32Y + X). Data written into the video RAM is either coming from ASCII char input, or from video RAM (in case of scroll). In the simplest case operation is VRAM[Y,X] <= char; X++; Of course when X reaches rightmost position, X <= 0; Y++; and when Y reaches bottom row image is scrolled up. There is also handling from CR, LF, CLS etc. (for example CLS is nested loop of X, Y: VRAM[Y, X] <= 0X20; (space)

Few things to note:

Define instruction register use and width

For a classic CPU, IR holds the currently executing instruction from the program stream. This controller processes ASCII stream, so it is useful to define the IR as currently processed character. If we care about 7-bit ASCII, that means 7-bit IR loaded from input 8-bit data input (MSB can be ignored). If char == 0x00 (NULL), that means no character to write to V-RAM.

Instruction register output is connected to mapper memory address (see here), defined as:

// mapper size is 128 words (as 7-bit ASCII code is used as "instruction") by 6 bits (to point to 1 of 64 microcode start locations)
// also generate all memory file formats. Note prefix: for .vhd, which is used to prepend to all generated aliases and constants
// this way multiple microcoded controllers can coexist in the same project even if their microfield have same name
.mapper 7, 6, tty_screen_map.mif, tty_screen_map.cgf, tty:tty_screen_map.vhd, tty_screen_map.hex, tty_screen_map.bin, 1;

 Looking at the generated tty_screen_map.hex file it becomes obvious that this is an auto-generated lookup table:

: 01 0000 00 0A F5
: 01 0001 00 0B F3
: 01 0002 00 12 EB
: 01 0003 00 0A F2
: 01 0004 00 0A F1
: 01 0005 00 0A F0
: 01 0006 00 0A EF
: 01 0007 00 0A EE
: 01 0008 00 0A ED
: 01 0009 00 0A EC
: 01 000A 00 13 E2
: 01 000B 00 0A EA
: 01 000C 00 0A E9
: 01 000D 00 21 D1
: 01 000E 00 0A E7
: 01 000F 00 0A E6
: 01 0010 00 0A E5
: 01 0011 00 0A E4
: 01 0012 00 0A E3
: 01 0013 00 0A E2
: 01 0014 00 0A E1
: 01 0015 00 0A E0
: 01 0016 00 0A DF
: 01 0017 00 0A DE
: 01 0018 00 0A DD
: 01 0019 00 0A DC
: 01 001A 00 0A DB
: 01 001B 00 0A DA
: 01 001C 00 0A D9
: 01 001D 00 0A D8
: 01 001E 00 0A D7
: 01 001F 00 0A D6

All special ASCII codes point to microcode location 0x0A because they match via .map pragma the location of following microinstruction:

            .map 0b00?_????;        // special characters 00-1F are not printable, so just ignore
nextChar:    ready = yes,
            if char_is_zero then waitChar else repeat;

 But for example char 0x01 (CLS == clear screen) points to 0x0B as that one is mapped right after:

        .map 0b000_0001;        // 0x01 SOH == clear screen
CLS:     data <= space, cursory <= zero;

Given that .map supports simple pattern matching using ? to indicate "don't care" bits, and .map can be "layered" (from less specific to more specific matches) this allows complex instruction decoding in a very simple way. 

Final piece here is "fork" control unit command. When executed, the uPC (micro program counter) is simply loaded from the mapper memory output, and next uI (micro instruction) is the start of the implementation routine:

waitChar:    ready = char_is_zero, data <= char,
         if char_is_zero then repeat else next;
            
         if true then fork else fork;    // interpret the ASCII code of char in data register as "instruction"

Define microinstruction fields

Go over the design and indentify how many control signals each component needs, and if those control signals drive "registers" or "direct signals". For example:

CursorY register can be:

which translates to (note .regfield !!):

        // Screen cursor Y position can stay the same, increment, decrement, or be set to maxcol
cursory:    .regfield 3 values 
        same, 
        zero,                     // top position
        inc, 
        dec, 
        maxrow default same;

 5 cases, for which we need 3 control lines. Default must be always specified, and that is "same" or "no change" - each microinstruction will have cursory <= same unless other value is specified.

The mcc compiler generates this code snippet:

alias tty_cursory:     std_logic_vector(2 downto 0) is tty_uinstruction(11 downto 9);
constant cursory_same:     std_logic_vector(2 downto 0) := "000";
constant cursory_zero:     std_logic_vector(2 downto 0) := "001";
constant cursory_inc:     std_logic_vector(2 downto 0) := "010";
constant cursory_dec:     std_logic_vector(2 downto 0) := "011";
constant cursory_maxrow:     std_logic_vector(2 downto 0) := "100";
---- Start boilerplate code (use with utmost caution!)
-- update_cursory: process(clk, tty_cursory)
-- begin
--    if (rising_edge(clk)) then
--        case tty_cursory is
----            when cursory_same =>
----                cursory <= cursory;
--            when cursory_zero =>
--                cursory <= (others => '0');
--            when cursory_inc =>
--                cursory <= std_logic_vector(unsigned(cursory) + 1);
--            when cursory_dec =>
--                cursory <= std_logic_vector(unsigned(cursory) - 1);
--            when cursory_maxrow =>
--                cursory <= maxrow;
--            when others =>
--                null;
--        end case;
-- end if;
-- end process;
---- End boilerplate code

The labels are not commented out, meaning that design which includes this file will match the microcode source at all times. 

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
use IEEE.NUMERIC_STD.ALL;

use work.tty_screen_code.all;
use work.tty_screen_map.all;

The sample implementation is commented out, it can be either copied over and uncommented, or left unused. mcc will even attempt to recognize usual operations as simple zero, and/or, inc/dec. These of course may not be most optimal, but will usually work and speed up development.  

Video memory RD and WR signals are driven directly (note .valfield !!), plus they are also mutually exclusive which can be expressed with:

    // video memory control bus, note that ordering of labels can be conveniently used to generate /RD and /WR signals
mem:    .valfield 2 values
    nop,            // no memory access
    read,            // mem(0) is RD
    write,            // mem(1) is WR
    -            // forbid read and write at same time
    default nop;

 So a 2-bit wide field will be needed. 

Generated code

alias tty_mem:     std_logic_vector(1 downto 0) is tty_uinstruction(6 downto 5);
constant mem_nop:     std_logic_vector(1 downto 0) := "00";
constant mem_read:     std_logic_vector(1 downto 0) := "01";
constant mem_write:     std_logic_vector(1 downto 0) := "10";
-- Value "11" not allowed (name '-' is not assignable)
---- Start boilerplate code (use with utmost caution!)
-- with tty_mem select mem <=
--      nop when mem_nop, -- default value
--      read when mem_read,
--      write when mem_write,
--      nop when others;
---- End boilerplate code

 The commented out  code here is not very useful (note there is no CLK signal involved for .valfield), but the tty_mem(1) can be used directly as WR and tty_mem(0) as RD signals to memory (active high usually in FPGAs, as opposed to many discrete ICs).

Adding all bit field widths together will be most of the microinstruction width, but not all, as control unit also needs to consume some. That's the next step.

Define program control conditions

Key feature of this microcoded concept is that each microinstruction - in addition to any number of parallel control codes to drive the design can also execute 1 program transfer instruction in the form:

if <condition> then <cmd_true|label_true> else <cmd_false|label_false>

-or 1 subroutine call-

label() (implemented as if true then label else label)

cmd can be any of:

First, the conditions (seq_cond reserved label) must be defined. This is done by analysing the design and figuring out which conditions are needed to drive the algorithm, for example:

In this design:

        // microcontroller also consumes microinstruction fields, first 3 bits to select an IF condition
        // true and false are handy to have around in all designs
        // assignment only through IF condition THEN target_true ELSE target_false
seq_cond:    .if 3 values 
        true,            // hard-code to 1
        char_is_zero,    // all branch conditions needed by the design must be listed and brought into a n to 1 MUX
        cursorx_ge_maxcol,
        cursory_ge_maxrow,
        cursorx_is_zero,
        cursory_is_zero,
        memory_ready,
        false            // hard-code to 0
        default true;

 Translated into VHDL:

alias tty_seq_cond:     std_logic_vector(2 downto 0) is tty_uinstruction(29 downto 27);
constant seq_cond_true:     integer := 0;
constant seq_cond_char_is_zero:     integer := 1;
constant seq_cond_cursorx_ge_maxcol:     integer := 2;
constant seq_cond_cursory_ge_maxrow:     integer := 3;
constant seq_cond_cursorx_is_zero:     integer := 4;
constant seq_cond_cursory_is_zero:     integer := 5;
constant seq_cond_memory_ready:     integer := 6;
constant seq_cond_false:     integer := 7;
---- Start boilerplate code (use with utmost caution!)
---- include '.controller <filename.vhd>, <stackdepth>;' in .mcc file to generate pre-canned microcode control unit and feed 'conditions' with:
--  cond(seq_cond_true) => '1',
--  cond(seq_cond_char_is_zero) => char_is_zero,
--  cond(seq_cond_cursorx_ge_maxcol) => cursorx_ge_maxcol,
--  cond(seq_cond_cursory_ge_maxrow) => cursory_ge_maxrow,
--  cond(seq_cond_cursorx_is_zero) => cursorx_is_zero,
--  cond(seq_cond_cursory_is_zero) => cursory_is_zero,
--  cond(seq_cond_memory_ready) => memory_ready,
--  cond(seq_cond_false) => '0',
---- End boilerplate code

Next, the "then" part must be defined using seq_then reserved label:

		// then 6 bits (because need to jump/call 64 locations) to specify THEN (to select if condition is true)
seq_then:	.then 6 values 
		next, 			// uPC <= uPC + 1
		repeat, 		// uPC <= uPC
		return, 		// uPC <= saved uPC
		fork, 
		@ default next;	// any label

 The width of this field will typically match the depth of the microcode (64 instructions, therefore 6). The first four are hard-coded sequencer commands, the rest 60 values are labels pointing to any place in microcode except first 4 locations. This minor loss (4 first locations can be still used as handy reset sequence) is offset by a compact and simple design of the control unit.

Finally, the "else" part is defined using "seq_else" reserved label:

		// then 6 values for ELSE (to select if condition is false)
seq_else:	.else 6 values 
		next, 
		repeat, 
		return, 
		fork, 
		0x00..0x3F, @ default next;	// any label or valid range value (allow field to be reused for constant

 As expected this is equivalent of .then but with a small tweak - arbitraty 6-bit values are allowed. This is handy for saving microinstruction width:

if true label else value;

Because condition is true, "value" part is never executed, it is a .valfield "for free"

Wrap-up microinstruction controller

For the templatized controller to work it need few more parameters:

Stack depth >0 allows microinstruction subroutine calls in format name() and return from them using return sequencer control code. 2 (single level subroutine calls allowed) is ok for simple controllers like this one, 4 is sufficient for moderately complex designs, and 8 is more than enough for complex CISC-like processors. 

// controller generated will have a 2 level hardware return stack and will advance on low to high clock transition
.controller tty_control_unit.vhd, 2, rising;

 This will generated following pre-canned control unit. Note that is actually has no stack pointer, but  a simple LIFO set of registers. This way push and pop (call and return) can be both executed in one CLK cycle in a simple manner. 

The clock edge can be defined as rising (microinstruction program counter, and all registers in the design) are updated with new values at rising_edge(clk), or as falling. The default is rising. 

Assembling a microcoded instruction

mcc is a two pass compiler / two mode compiler (one mode is generating microcode, other mode is converting useful memory formats). The implementation of these passes can be followed here

The final generated microinstruction can be thought of as a long binary vector. Each component of the vector is a field of fixed (but not same as other) size, and with a defined set of valid values. If a value of vector is not specified in the source code, the compiler picks the default - which must always be defined for every field.

This is best visible in the "noop" instruction. In source code:

noop:	    .alias if true then next else next;
...
_reset2:    noop;

In the generated VHDL:

-- L0114@0002._reset2:  if true then next else next;
--  ready = 00, if (000) then 000000 else 000000, cursorx <= 000, cursory <= 000, data <= 00, mem = 00, reserved = 00000;
2 => "00" & O"0" & O"00" & O"00" & O"0" & O"0" & "00" & "00" & "00000",

Next instruction sets cursorX and cursorY "vectors" to their allowed values:

_reset3:	cursorx <= zero, cursory <= zero; 

And becomes in VDHL:

-- L0116@0003._reset3:  cursorx <= zero, cursory <= zero;
--  ready = 00, if (000) then 000000 else 000000, cursorx <= 001, cursory <= 001, data <= 00, mem = 00, reserved = 00000;
3 => "00" & O"0" & O"00" & O"00" & O"1" & O"1" & "00" & "00" & "00000",

And this difference can be seen in any other memory representation file generated:

%---------------------------------%
WIDTH=32;
DEPTH=64;
ADDRESS_RADIX=HEX;
DATA_RADIX=HEX;
CONTENT BEGIN
[0000 .. 0002] : 00000000;
0003 : 00001200;
...

Further reading

Discussions