Fixing divider and overflow flag issues

A project log for TMS9900 compatible CPU core in VHDL

Retro challenge 2017/04 project to create a TMS9900 compatible CPU core. Again in a month... Failure could be an option...

erik-piehlErik Piehl 09/24/2018 at 18:211 Comment

After remembering where I was in the project I started to look for bugs in my CPU. I know it does not fully work, for example since running in TI BASIC I get:

PRINT 1*-1

So something is not working in the CPU. In order to work on this, I took advantage of my previous design, which combines a real TMS99105 CPU with my FPGA implementation of the TI-99/4A. Running the above on it yields the correct result, -1.

So I wrote a piece of test code, ran it both on the FPGA CPU and the TMS99105 while capturing the results (by dumping a section of memory on both systems to a file), below is the comparison:

Left hand side is TMS99105, right hand side is my soft CPU, i.e. the FPGA TMS9900 core. Each instruction is tested 8 times with different data. The source code of this test is below, after the explanation below.

Each instruction test output takes 8 bytes or 4 words. The last word of the output are the flags (only top 6 bits preserved). The result words are R1,R2,R3 and flags. The instruction is always executed like SUB R1,R2. Thus R2 is the result and R1 shows the source operand.

With that, we can see that there is a difference with the second instruction under test. It is the subtract instruction.
The first subtraction works fine (SUB 1,2, i.e. 2-1) but the second has a difference in the flags (SUB >7FFF,1) where the soft CPU has >2800 while the real CPU has >2000.
The flag that is different is ST4 i.e. the overflow flag. Also in the the other SUB instructions the overflow flag is sometimes bogus, here is a table of the eight cases:
SUB >1,>2           OK (note that with the TMS9900 this actually is the 2-1 operation)
SUB >7FFF,1         Bug: soft-cpu asserts overflow
SUB >8000,>7FFF     Bug: soft-cpu does not assert overflow
SUB >7FFF,>8000     Bug: soft-cpu does not assert overflow
SUB >FFFF,>8000     OK
SUB >8000,>FFFF     Bug: soft-cpu asserts overflow
SUB >8000,>8000     Bug: soft-cpu asserts overflow
SUB >0,>8000        OK
Looking at the data sheet carefully, there is a difference how ST4 (overflow) is asserted for adds and subs, the first condition is inverted. I bet I don’t do this.
Adds: If MSB(SA) == MSB(DA) and MSB of result != MSB(DA)
Subs: If MSB(SA) != MSB(DA) and MSB of result != MSB(DA)

The only other difference with this data is at >0120, and here the result is wrong (but flags ok). Since R3 has changed, it must be a DIV or MPY instruction.
First instruction test output at 0..>3F, 2nd at 40..7F, 3rd at 80..BF, 4th at C0..FF, 5th at 100..13F. So the fifth instruction.
And indeed it is the DIV instruction - like PNR reported. One of my test cases at least now catches the problem. It is fifth test case, from above it is
DIV >FFFF,>8000 i.e. >8000 divided by >FFFF. The result should be >8000 as quotient and >8000 as remainder. 
But my code gives >FFFE as quotient and >FFFE as remainder too.

Here is the TMS9900 assembler test code (story continues after the listing):

; EP 2018-09-23 - run through a sequence of instructions with data and  write
;    results to RAM. This is to enable comparing the FPGA CPU and TMS9900.
      LI    R5,>2000            ; point to result table
      LI    R7,TEST_ROUTINES    ; point to test routines
      MOV    *R7+,R8                ; address of routine to test
      LI    R6,TEST_DATA_SEQ
      MOV    *R6+,R1                ; fetch test parameters
      MOV    *R6+,R2
      CLR    R3
; perform operation under test      
      BL    *R8
; save results
      MOV    R1,*R5+
      MOV   R2,*R5+
      MOV   R3,*R5+
      STST    R3
      ANDI    R3,>FC00            ; only keep meaningful flags
      MOV   R3,*R5+          
      CI    R6,TEST_DEND
      JNE    -!      
      CI    R7,TEST_ROUT_END
      JNE    RUN_TEST
; write end marker to memory
      LI    R3,>1234
      MOV    R3,*R5+
      MOV    R3,*R5+
      MOV    R3,*R5+
      MOV    R3,*R5+

And here is the data:
TEST_DATA_SEQ            ; Parameters to pass two various instructions
    DATA    1,2            ; First data set
    DATA    >7FFF,1        ; 2nd
    DATA    >8000,>7FFF
    DATA     >7FFF,>8000
    DATA    >FFFF,>8000    ; 5th
    DATA    >8000,>FFFF
    DATA    >8000,>8000    ; 7th
    DATA    0,>8000        ; 8th


DO_ADD    A    R1,R2
DO_SUB    S    R1,R2
DO_DIV    DIV    R1,R2
DO_MPY    MPY    R1,R2
DO_COMP    C    R1,R2

Once I had analysed the problems, I fixed both the divider and the overflow flag.  The fixes for both problems are in the tms9900.vhd source code committed to the soft-cpu-tms9902 branch at GitHub.

The fix for the overflow flag was easy, just taking into account the difference in flag generation. I took still a few iterations from me to get it working properly. The resulting VHDL code for overflow flag generation is:

        -- ST4 overflow
        alu_flag_overflow <=
                '1' when (ope = alu_compare or ope = alu_sub)
          and arg1(15) /= arg2(15) and alu_result(15) /= arg1(15) else
                '1' when (ope /= alu_sla and not (ope = alu_compare or ope = alu_sub)) and arg1(15) =  arg2(15) and alu_result(15) /= arg1(15) else
                '1' when ope = alu_sla and alu_result(15) /= arg2(15) else -- sla condition: if MSB changes during shift

Fixing the division problem was a little more involved, once I understood that I actually need a 17-bit subtraction the divider started to work properly. I had not captured the problem in my earlier testing because I only manually tested a few cases and did not try dividers above 32767, i.e. where the 16-bit divisor has bit 15 set.  Implementing this fix in a simple way required me to add a dedicated 17-bit subtract unit to the divider code, rather than relying on the ALU I already had in the design, here is the key snippet of code, the story continues a bit more after the code:

when do_div1 =>
        dividend(15 downto 0) <= rd_dat; -- store the low word
        shift_count <= "10000"; -- 16
        cpu_state <= do_div2;
when do_div2 =>
        dividend(31 downto 0) <= dividend(30 downto 0) & '0'; -- shift left
        -- perform 17-bit substraction, picking up the bit to shifted out too
        divider_sub <= std_logic_vector(unsigned(dividend(31 downto 15)) - unsi
gned('0' & reg_t));
        dec_shift_count := True;        -- decrement count

        cpu_state <= do_div3;
when do_div3 =>
        if divider_sub(16)='0' then
                -- successful subtract
                dividend(31 downto 16) <= divider_sub(15 downto 0);
                dividend(0) <= '1';
        end if;
        if shift_count /= "00000" then
                cpu_state <= do_div2; -- loop back
                cpu_state <= do_div4;
        end if;
when do_div4 =>
        -- done with the division.

The good news is that after these fixes the instructions that I now test and compare with a real TMS99105 all work identically. The bad news is that the BASIC statement at beginning of this blog entry still gives me the same result - so on with the bug hunt for further bugs!


Erik Piehl wrote 09/24/2018 at 18:26 point

I forgot to comment this update with the crucial information that the divider implements a 32:16=16 division, i.e. the source operand is 32 bits, divisor is 16 bits, and the resulting quotient is 16 bits. The division also returns the remainder, as is typical for this type of a division algorithm, so the overall output are two 16-bit values.

  Are you sure? yes | no