In this project, instead of using a standard embedded processor and programming it to execute 3 tasks at hand (parsing HEX character stream, generating HEX character stream, and writing character stream to video RAM driving VGA), I created 3 independent micro-coded controllers, each tailored per task, and which can all operate in parallel. They can also be taken out of this project and dropped to any other needing that functionality.
Creating such controllers is possible in standardized way using mcc - microcode compiler. The simplest of these is the tty_screen which was up and running in one afternoon. Here are the suggested steps.
(before digging in, reading this log could be useful to explain some microcoding basics and how they are leveraged in my standardized / parametric approach)
Define a high-level / draft design
It should be obvious that this is a custom memory access circuit, where the memory address is given by cursor X and Y positions which can go from 0..79 and 0..59 (for 640*480 VGA with 8*8 pixel font, actual VRAM address is A = 64Y + 32Y + X). Data written into the video RAM is either coming from ASCII char input, or from video RAM (in case of scroll). In the simplest case operation is VRAM[Y,X] <= char; X++; Of course when X reaches rightmost position, X <= 0; Y++; and when Y reaches bottom row image is scrolled up. There is also handling from CR, LF, CLS etc. (for example CLS is nested loop of X, Y: VRAM[Y, X] <= 0X20; (space)
Few things to note:
- no need to worry about the internals of the control unit - it will be auto-generated with all the right parameters
- it is good to define all the registers and where they will get their values from (other registers, external inputs), or from some ALU operations
- control unit will drive a conditions code MUX, because it eventually only consumes true/false at each instruction for executing either the .then or .else part - this conditions should be enumerated to find out if 4, 8, 16 or more will be needed
- control unit and internal registers are driven by same CLK (rising or falling edge is not important, but typically should be same)
- typically only the control unit consumes RESET, other parts can be initialized under microinstruction control
- control unit drives itself and the rest of the design, via direct signals (e.g. RD/WR in this design) or selecting MUXs in front of registers ("RTL")
- condition control bits can come from inside of the design (e.g. comparing registers with zero etc.) or outside (e.g. memory READY), or any logical combination of those
- signal width are good to note, but can be changed, they are not reflected in microcode
Define instruction register use and width
For a classic CPU, IR holds the currently executing instruction from the program stream. This controller processes ASCII stream, so it is useful to define the IR as currently processed character. If we care about 7-bit ASCII, that means 7-bit IR loaded from input 8-bit data input (MSB can be ignored). If char == 0x00 (NULL), that means no character to write to V-RAM.
Instruction register output is connected to mapper memory address (see here), defined as:
// mapper size is 128 words (as 7-bit ASCII code is used as "instruction") by 6 bits (to point to 1 of 64 microcode start locations) // also generate all memory file formats. Note prefix: for .vhd, which is used to prepend to all generated aliases and constants // this way multiple microcoded controllers can coexist in the same project even if their microfield have same name .mapper 7, 6, tty_screen_map.mif, tty_screen_map.cgf, tty:tty_screen_map.vhd, tty_screen_map.hex, tty_screen_map.bin, 1;
Looking at the generated tty_screen_map.hex file it becomes obvious that this is an auto-generated lookup table:
: 01 0000 00 0A F5 : 01 0001 00 0B F3 : 01 0002 00 12 EB : 01 0003 00 0A F2 : 01 0004 00 0A F1 : 01 0005 00 0A F0 : 01 0006 00 0A EF : 01 0007 00 0A EE : 01 0008 00 0A ED : 01 0009 00 0A EC : 01 000A 00 13 E2 : 01 000B 00 0A EA : 01 000C 00 0A E9 : 01 000D 00 21 D1 : 01 000E 00 0A E7 : 01 000F 00 0A E6 : 01 0010 00 0A E5 : 01 0011 00 0A E4 : 01 0012 00 0A E3 : 01 0013 00 0A E2 : 01 0014 00 0A E1 : 01 0015 00 0A E0 : 01 0016 00 0A DF : 01 0017 00 0A DE : 01 0018 00 0A DD : 01 0019 00 0A DC : 01 001A 00 0A DB : 01 001B 00 0A DA : 01 001C 00 0A D9 : 01 001D 00 0A D8 : 01 001E 00 0A D7 : 01 001F 00 0A D6
All special ASCII codes point to microcode location 0x0A because they match via .map pragma the location of following microinstruction:
.map 0b00?_????; // special characters 00-1F are not printable, so just ignore nextChar: ready = yes, if char_is_zero then waitChar else repeat;
But for example char 0x01 (CLS == clear screen) points to 0x0B as that one is mapped right after:
.map 0b000_0001; // 0x01 SOH == clear screen CLS: data <= space, cursory <= zero;
Given that .map supports simple pattern matching using ? to indicate "don't care" bits, and .map can be "layered" (from less specific to more specific matches) this allows complex instruction decoding in a very simple way.
Final piece here is "fork" control unit command. When executed, the uPC (micro program counter) is simply loaded from the mapper memory output, and next uI (micro instruction) is the start of the implementation routine:
waitChar: ready = char_is_zero, data <= char, if char_is_zero then repeat else next; if true then fork else fork; // interpret the ASCII code of char in data register as "instruction"
Define microinstruction fields
Go over the design and indentify how many control signals each component needs, and if those control signals drive "registers" or "direct signals". For example:
CursorY register can be:
- stay the same (no change)
- loaded with maximum row number
which translates to (note .regfield !!):
// Screen cursor Y position can stay the same, increment, decrement, or be set to maxcol cursory: .regfield 3 values same, zero, // top position inc, dec, maxrow default same;
5 cases, for which we need 3 control lines. Default must be always specified, and that is "same" or "no change" - each microinstruction will have cursory <= same unless other value is specified.
The mcc compiler generates this code snippet:
alias tty_cursory: std_logic_vector(2 downto 0) is tty_uinstruction(11 downto 9); constant cursory_same: std_logic_vector(2 downto 0) := "000"; constant cursory_zero: std_logic_vector(2 downto 0) := "001"; constant cursory_inc: std_logic_vector(2 downto 0) := "010"; constant cursory_dec: std_logic_vector(2 downto 0) := "011"; constant cursory_maxrow: std_logic_vector(2 downto 0) := "100"; ---- Start boilerplate code (use with utmost caution!) -- update_cursory: process(clk, tty_cursory) -- begin -- if (rising_edge(clk)) then -- case tty_cursory is ---- when cursory_same => ---- cursory <= cursory; -- when cursory_zero => -- cursory <= (others => '0'); -- when cursory_inc => -- cursory <= std_logic_vector(unsigned(cursory) + 1); -- when cursory_dec => -- cursory <= std_logic_vector(unsigned(cursory) - 1); -- when cursory_maxrow => -- cursory <= maxrow; -- when others => -- null; -- end case; -- end if; -- end process; ---- End boilerplate code
The labels are not commented out, meaning that design which includes this file will match the microcode source at all times.
library IEEE; use IEEE.STD_LOGIC_1164.ALL; -- Uncomment the following library declaration if using -- arithmetic functions with Signed or Unsigned values use IEEE.NUMERIC_STD.ALL; use work.tty_screen_code.all; use work.tty_screen_map.all;
The sample implementation is commented out, it can be either copied over and uncommented, or left unused. mcc will even attempt to recognize usual operations as simple zero, and/or, inc/dec. These of course may not be most optimal, but will usually work and speed up development.
Video memory RD and WR signals are driven directly (note .valfield !!), plus they are also mutually exclusive which can be expressed with:
// video memory control bus, note that ordering of labels can be conveniently used to generate /RD and /WR signals mem: .valfield 2 values nop, // no memory access read, // mem(0) is RD write, // mem(1) is WR - // forbid read and write at same time default nop;
So a 2-bit wide field will be needed.
alias tty_mem: std_logic_vector(1 downto 0) is tty_uinstruction(6 downto 5); constant mem_nop: std_logic_vector(1 downto 0) := "00"; constant mem_read: std_logic_vector(1 downto 0) := "01"; constant mem_write: std_logic_vector(1 downto 0) := "10"; -- Value "11" not allowed (name '-' is not assignable) ---- Start boilerplate code (use with utmost caution!) -- with tty_mem select mem <= -- nop when mem_nop, -- default value -- read when mem_read, -- write when mem_write, -- nop when others; ---- End boilerplate code
The commented out code here is not very useful (note there is no CLK signal involved for .valfield), but the tty_mem(1) can be used directly as WR and tty_mem(0) as RD signals to memory (active high usually in FPGAs, as opposed to many discrete ICs).
Adding all bit field widths together will be most of the microinstruction width, but not all, as control unit also needs to consume some. That's the next step.
Define program control conditions
Key feature of this microcoded concept is that each microinstruction - in addition to any number of parallel control codes to drive the design can also execute 1 program transfer instruction in the form:
if <condition> then <cmd_true|label_true> else <cmd_false|label_false>
-or 1 subroutine call-
label() (implemented as if true then label else label)
cmd can be any of:
- next (uPC <= uPC + 1)
- repeat (uPC <= uPC)
- return (uPC <= saved uPC)
- fork (uPC <= map[instruction])
First, the conditions (seq_cond reserved label) must be defined. This is done by analysing the design and figuring out which conditions are needed to drive the algorithm, for example:
- register value is zero, negative, even/odd, same/below/over some value etc.
- ALU output flags (N, V, Z, C, P, etc.)
- external signal states (e.g. READY, START, STOP or similar)
- To these add the TRUE/FALSE (very handy to have)
In this design:
// microcontroller also consumes microinstruction fields, first 3 bits to select an IF condition // true and false are handy to have around in all designs // assignment only through IF condition THEN target_true ELSE target_false seq_cond: .if 3 values true, // hard-code to 1 char_is_zero, // all branch conditions needed by the design must be listed and brought into a n to 1 MUX cursorx_ge_maxcol, cursory_ge_maxrow, cursorx_is_zero, cursory_is_zero, memory_ready, false // hard-code to 0 default true;
Translated into VHDL:
alias tty_seq_cond: std_logic_vector(2 downto 0) is tty_uinstruction(29 downto 27); constant seq_cond_true: integer := 0; constant seq_cond_char_is_zero: integer := 1; constant seq_cond_cursorx_ge_maxcol: integer := 2; constant seq_cond_cursory_ge_maxrow: integer := 3; constant seq_cond_cursorx_is_zero: integer := 4; constant seq_cond_cursory_is_zero: integer := 5; constant seq_cond_memory_ready: integer := 6; constant seq_cond_false: integer := 7; ---- Start boilerplate code (use with utmost caution!) ---- include '.controller <filename.vhd>, <stackdepth>;' in .mcc file to generate pre-canned microcode control unit and feed 'conditions' with: -- cond(seq_cond_true) => '1', -- cond(seq_cond_char_is_zero) => char_is_zero, -- cond(seq_cond_cursorx_ge_maxcol) => cursorx_ge_maxcol, -- cond(seq_cond_cursory_ge_maxrow) => cursory_ge_maxrow, -- cond(seq_cond_cursorx_is_zero) => cursorx_is_zero, -- cond(seq_cond_cursory_is_zero) => cursory_is_zero, -- cond(seq_cond_memory_ready) => memory_ready, -- cond(seq_cond_false) => '0', ---- End boilerplate code
Next, the "then" part must be defined using seq_then reserved label:
// then 6 bits (because need to jump/call 64 locations) to specify THEN (to select if condition is true) seq_then: .then 6 values next, // uPC <= uPC + 1 repeat, // uPC <= uPC return, // uPC <= saved uPC fork, @ default next; // any label
The width of this field will typically match the depth of the microcode (64 instructions, therefore 6). The first four are hard-coded sequencer commands, the rest 60 values are labels pointing to any place in microcode except first 4 locations. This minor loss (4 first locations can be still used as handy reset sequence) is offset by a compact and simple design of the control unit.
Finally, the "else" part is defined using "seq_else" reserved label:
// then 6 values for ELSE (to select if condition is false) seq_else: .else 6 values next, repeat, return, fork, 0x00..0x3F, @ default next; // any label or valid range value (allow field to be reused for constant
As expected this is equivalent of .then but with a small tweak - arbitraty 6-bit values are allowed. This is handy for saving microinstruction width:
if true label else value;
Because condition is true, "value" part is never executed, it is a .valfield "for free"
Wrap-up microinstruction controller
For the templatized controller to work it need few more parameters:
- File name (.vhd only supported for now) where to generate the controller code
- Stack depth
- clock edge (rising or falling, default is rising)
Stack depth >0 allows microinstruction subroutine calls in format name() and return from them using return sequencer control code. 2 (single level subroutine calls allowed) is ok for simple controllers like this one, 4 is sufficient for moderately complex designs, and 8 is more than enough for complex CISC-like processors.
// controller generated will have a 2 level hardware return stack and will advance on low to high clock transition .controller tty_control_unit.vhd, 2, rising;
This will generated following pre-canned control unit. Note that is actually has no stack pointer, but a simple LIFO set of registers. This way push and pop (call and return) can be both executed in one CLK cycle in a simple manner.
The clock edge can be defined as rising (microinstruction program counter, and all registers in the design) are updated with new values at rising_edge(clk), or as falling. The default is rising.
Assembling a microcoded instruction
mcc is a two pass compiler / two mode compiler (one mode is generating microcode, other mode is converting useful memory formats). The implementation of these passes can be followed here.
The final generated microinstruction can be thought of as a long binary vector. Each component of the vector is a field of fixed (but not same as other) size, and with a defined set of valid values. If a value of vector is not specified in the source code, the compiler picks the default - which must always be defined for every field.
This is best visible in the "noop" instruction. In source code:
noop: .alias if true then next else next; ... _reset2: noop;
In the generated VHDL:
-- L0114@0002._reset2: if true then next else next; -- ready = 00, if (000) then 000000 else 000000, cursorx <= 000, cursory <= 000, data <= 00, mem = 00, reserved = 00000; 2 => "00" & O"0" & O"00" & O"00" & O"0" & O"0" & "00" & "00" & "00000",
Next instruction sets cursorX and cursorY "vectors" to their allowed values:
_reset3: cursorx <= zero, cursory <= zero;
And becomes in VDHL:
-- L0116@0003._reset3: cursorx <= zero, cursory <= zero; -- ready = 00, if (000) then 000000 else 000000, cursorx <= 001, cursory <= 001, data <= 00, mem = 00, reserved = 00000; 3 => "00" & O"0" & O"00" & O"00" & O"1" & O"1" & "00" & "00" & "00000",
And this difference can be seen in any other memory representation file generated:
%---------------------------------% WIDTH=32; DEPTH=64; ADDRESS_RADIX=HEX; DATA_RADIX=HEX; CONTENT BEGIN [0000 .. 0002] : 00000000; 0003 : 00001200; ...