09/22/2019 at 18:02 •
Few annoying bugs I was unable to track down and fix so far. Note that none of them affect the calculations due to workarounds or fixes applies.
1. Ghost digits on 7-seg leds
These appear on any CPU frequency higher that 50kHz or so, but only in the calculator display path, not when displaying the internal debug content of macro and microprogram counters. The display strobing goes from higher digits to lower (this is necessary because only this way the leading zero surpression can work) and it appears the segments "race ahead" of the digit strobes.
This bug renders 7 seg display unusable for normal calculator operations. The workaround is to use VGA only to display entry and results, which is no great loss as 4 digit 7-seg can only display half of the digits at a time, one has to press BTN0 to switch to upper half, making it very impractical to begin with.
2. Not all digits of internal registers displayed on VGA debug screen
TMS0800 registers are either 11 BCD digits long (A, B, C), or 11 bits long (AF, BF). There is a timing problem in VGA debug component or its driver which "prints" only:
9 last digits at 12.5MHz
10 last digits at 6.25MHz
At lower frequencies (57.6kHz and 1kHz) all digits are printed on the screen. None of this impacts operation of the calculator core, except makes first 1-2 digits (containing sign etc.) invisible in debug mode.
3. Stray code execution
There is no unconditional jump in TMS0800 - those are modeled by "knowing" the state of CF flag and executing "jump on condition set/reset" instructions accordingly. However due to probably another unknown bug, in Sinclair mode at the end of ROM (part of AntiLog evaluation routine) CF is sometimes not the expected "1". This has been "fixed" with a hack explained in the code, but the real fix should be in some underlying issue (perhaps microcode implementation):
-- HACKHACK: Why is this "random instruction" being passed in to intitialize the ROM???
-- Last instruction in Sinclair ROM as 0x13F is "BINE ALOGDIV" - however in some cases
-- CF is 0 which means it will not be executed and execution will continue at 0x140, which
-- will bring it back to the right place, instead of executing bad opcodes, or NOPs which
-- would wrap up to reset location 0. This is indication of another bug but for now this
fill_value => BIE_ALOGDIV,
sinclair_mode => true, -- hint to show correct disassembled listing (Sinclair mode)
asm_filename => "./sourceCode_sinclair.asm",
lst_filename => "./tms0800/output/sourceCode_sinclair.lst"
address => pc,
data => instruction_sinclair
4. Numeric digit keyboard off by one
Each TMS0800 instruction "pulls" the next right keyboard scan line down. That is the reason why the sequence of instructions is arranged exactly like the keys on the keyboard:
0 0111 0001 0071 10 00000 0000 BKO CLEAR ; Clear key pressed?
0 0111 0010 0072 10 00101 0101 BKO EQLKEY ; Equal key pressed?
0 0111 0011 0073 10 00100 1101 BKO PLSKEY ; Plus key pressed?
0 0111 0100 0074 10 00011 1101 BKO MINKEY ; Minus key pressed?
0 0111 0101 0075 10 00100 1100 BKO MLTKEY ; Mult key pressed?
0 0111 0110 0076 10 00100 1011 BKO DIVKEY ; Divide key pressed?
0 0111 0111 0077 10 00110 0011 BKO CEKEY ; CE key pressed?
0 0111 1000 0078 10 00101 1101 BKO DPTKEY ; Decimal point key pressed?
0 0111 1001 0079 10 00111 1110 BKO ZERKEY ; Zero key pressed?
0 0111 1010 007A 11 10011 1111 EXAB ALL ; Process digit key...
0 0111 1011 007B 11 11010 0010 AKCN LSD1 ; Count key position into A
After the EXAB, the scan should wrap around and start with key "1", which if detected would increment register A in the end accumulating the count until a key was found (so it would return with 8 if key down contact was sensed when 8th line from left was pulled down etc.). However, due to some timing bug sometimes the count starts off by one, so pressing "3" registers as "2" etc. The "fix" is a hack in microcode to double check in AKCN that right scan line is down before proceeding:
118 => -- AKCN
uc_if(cond_e11, upc_next, uc_label(121)),
121 => -- if kn was down, means we have a correct count in last mantissa, so bail, otherwise continue
uc_if(cond_kn, upc_next, uc_label(CONTINUE)),
122 => -- if scanned all, bail with CF = 1 to indicate no key
uc_if(cond_digit10, uc_label(CONTINUECS), uc_label(FORK)),
-- HACKHACK: make sure we are at last digit before continuing!
uc_if(cond_digit10, uc_label(CONTINUE), upc_next),
09/05/2019 at 03:19 •
There are various well known and documented patterns and best practices to create FSM (finite state machine) designs that work well on FPGAs. However from what I have seen, not much in terms of how to have a simple, good microcode pattern. One could of course adapt the methodology (and even tooling) popular in the bit-slice era when microcoding was the most popular way to create custom processors and logic (which I have done too for the Am9080 project using Am2901 slices), but that approach is not very streamlined either.
My first attempt was to write a separate microcode compiler in C# which would "spit out" file in a format that could directly be used to prime read-only memory (microcode or mapping ROM) in the VDHL. Starting work on that I realized that may be too heavy-weight for my needs, and also has the disadvantage of adding another proprietary tool to the toolchain and extra step in the journey towards the .bit file.
Better alternative seemed to be to do this right in VHDL, "synthesizing" the contents of the microcode ROM as VHDL code is being compiled. Eventually, I settled on the approach described below, which I used in this project both for the calculator core and for the VGA tracer components.
That simpler solution turned out to be just a combination of few tricks in VHDL. Here is the pattern:
- define microinstruction which as a "NOP" to be all zeroes (all hardware driven by microinstruction should interpret zero control bits as doing nothing!!)
- each microinstruction is comprised of multiple "fields" of varying length - for each of these write a helper function taking as parameters everything needed to describe the functionality of the target component (in many cases these are driving selects to muxes, or maybe enabling some signals or similar) and returning the desired value of the targeted field, inserted into a "NOP" microinstruction
- concatenate any number of these functions with simple "or" (as long as same function is not used more than once!)
Most of this can be seen in a single file:
For example look at how alu function is defined:
-- 3 BITS 13..11
-- alias alu_fun: std_logic_vector(2 downto 0) is u_code(13 downto 11);
impure function uc_alu(alu_fun: in std_logic_vector(2 downto 0)) return std_logic_vector is
return X"00000" & "000000000000000000" & alu_fun & "00000000000";
as you can see it just returns 3 bits in the right place in the microinstruction word defining the ALU function, and the rest of what it returns is all zeros:
-- ALU functions
constant fun_zero : std_logic_vector(2 downto 0) := "000";
constant fun_s : std_logic_vector(2 downto 0) := "001";
constant fun_r : std_logic_vector(2 downto 0) := "010";
constant fun_xor : std_logic_vector(2 downto 0) := "011";
constant fun_adchex :std_logic_vector(2 downto 0) := "100";
constant fun_adcbcd :std_logic_vector(2 downto 0) := "101";
constant fun_sbchex :std_logic_vector(2 downto 0) := "110";
constant fun_sbcbcd :std_logic_vector(2 downto 0) := "111";
Obviously, the actual ALU can now consume the same definition and implement the functionality accordingly:
with fun select
y <= s when fun_s,
r when fun_r,
(s xor r) when fun_xor,
sum0(3 downto 0) when fun_adchex,
sum2(3 downto 0) when fun_adcbcd,
dif0(3 downto 0) when fun_sbchex,
dif2(3 downto 0) when fun_sbcbcd,
"0000" when others;
given that the return is "zeros" for anything outside this field - meaning NOP for all other components driven by the microinstruction, it won't impact them. So one can simply "or" it together with any other similar functions ("helpers") to create a microinstruction to do drive other components as needed:
It should be pretty obvious to read the microinstruction above and figure out what it is trying to do - which is critical to minimize otherwise extremely error-prone microcoding (note: using "and" instead of "or" may be even more intuitive, but in that case NOP microinstruction should be defined as all "1111...." and all components driven by it should interpret all "111" field as do nothing)
The microinstruction controller is just another component driven by the microinstruction, and in my implementation it needs the code to select condition (4 bits = 16 conditions), and where to go when condition is true or false. With that, one can write convenient "high level language" branch statements:
COPYS => -- AKA, AKB, AKC, ABOA, ABOC
uc_if(cond_e11, upc_next, uc_label(CONTINUE)),
If cond_e11 is true, then continue with next microinstruction, otherwise jump to label "continue". The trick here is that while the destination for if and else are real locations in the microcode, some special values are "reserved" - value 0x00 does not jump to location 0, but is actually a "next" (again, remember that means NOP for the microinstruction controller):
-- special microcode "goto" codes (all others will be jump to that location)
constant upc_next: std_logic_vector(7 downto 0) := X"00"; -- means we can't jump to location 0!
constant upc_return: std_logic_vector(7 downto 0) := X"01"; -- means we can't jump to location 1!
constant upc_repeat: std_logic_vector(7 downto 0) := X"FF"; -- means we can't jump to location 255!
constant upc_fork: std_logic_vector(7 downto 0) := X"FE"; -- means we can't jump to location 254!
With this, any microinstruction can define not just the behavior of all the driven components, but also a rather powerful but simple if which can jump, return, fork or repeat based on condition (as set by the execution of previous microinstruction! - this is another common source of bugs). There is no implicit "call" - each jump saves the return address in one layer (1 deep stack), but this could be extended, it was sufficient for my calculator project. Remember, if "if(cond, then_destination, else_destination)" is missing, that means 0x00000 will be in the right location of the microcode meaning "if(true, upc_next, upc_next) - so simply continue. Now we just need to drive the microinstruction pointer register accordingly:
-- update microcode program counter
update_upc: process(clk, reset, u_next)
if (reset = '1') then
-- start execution at location 0, microinstructions 0 - 127 can be shared by any instruction
u_pc <= X"00";
u_ra <= X"00";
if (rising_edge(clk)) then
case u_next is
-- if condition(0) = '1' then X"00000" (default) will cause simple u_pc advance
when upc_next =>
u_pc <= std_logic_vector(unsigned(u_pc) + 1);
-- used to repeat same microinstruction until condition turns true
when upc_repeat =>
u_pc <= u_pc;
-- start executing macroinstruction routine, which are mapped to 128 - 255
when upc_fork =>
if (instruction(6 downto 5) = "00") then
-- if the instruction is JUMP on condition reset, mask out the jump target
-- this way 32 microcode locations are freed up!
u_pc <= "10000000";
-- map 7 bit instruction directly to upper 128 words of microcode
u_pc <= '1' & instruction;
-- return from "1 level subroutine"
when upc_return =>
u_pc <= u_ra;
-- any other value is a jump to that microinstruction location, save return address for "1 level stack"
when others =>
u_pc <= u_next;
u_ra <= std_logic_vector(unsigned(u_pc) + 1);
With this I was able to write pretty complex microcode in a standardized fashion. As part of the compile the generated microcode ROM is also output to a file so one can compare that output with the input and use that to spot bugs early on (a classic bug is to have overlapping microinstruction fields...):
procedure dump_microcode(out_file_name: in string; temp_mem: in rom256x52; depth: integer; base: integer) is
file out_file : text; -- open write_mode is out_file_name;
variable out_line : line;
write(out_line, decode8(temp_mem(i)(13 downto 11), "alu_y = 0; ", "alu_y = s(alu_sel); ", "alu_y = r(alu_sel); ", "alu_y = xor(alu_sel); ", "alu_y = adchex(alu_sel); ", "alu_y = adcbcd(alu_sel); ", "alu_y = sbchex(alu_sel); ", "alu_y = sbcbcd(alu_sel); "));
The intent here is to write code here that tries to "reverse engineer" already existing microcode store. In a way, this is a test driven development as it is advisable to first write this function right after defining the microinstruction format, and then as microcode is developed, after each compile watch its output, compare with the intent of the microcode written and if they differ that is 100% indication of a bug.
A note about instruction mapping:
When implementing microcode driven CPUs or controllers, a common problem is to "map" op-codes to the first location of microcode executing that instructions. This can be done in several ways:
- a separate "mapping" ROM is introduced - its depth equals the width of the opcode covered and the width equals the depth of the microcode ROM. So location 0x76 (op code for HLT for 8080/8085/Z80) may contain for example 0x3E3 which would be the location of first microinstruction is some microcode ROM that has at least 1024 locations.
- static logic that cleverly translates op codes into microinstruction entry points. Usually this is possible with CPUs with highly orthogonal and/or reduced instruction sets
- there is direct mapping. This approach was taken here - the TMS0800 never has more than 7 bits to define an instruction, so a microcode ROM of 256 locations is sufficient - the lower 128 locations to implement them, and the upper 128 to map directly to the first microinstruction to be executed. In other words, opcode 0x3E will start with microinstruction at location 0xBE etc.
In all cases above, there is a "fork" microinstruction field, which has the task to load the output coming from any of these methods into the microinstruction pointer.
09/03/2019 at 06:59 •
I was able to verify few operations on basic arguments and compare them to real Sinclair Scientific for match. Following problems remain:
- LED blurs at higher frequency. Calculation can be done at 12.5MHz but LED is blurry above 40kHz or so
- breakpoint and single step logic needs to be laid out more logically on the switches, and dual clock unit avoided
- VGA tracer still drops some character from display.
Note that none of the above impacts calculation. I also briefly tested TI mode, seems no regression was introduced there.