• Two more FPGA CPU bug fixes

    Erik Piehl09/24/2018 at 19:46 0 comments

    Already second project update for the day! I added way more test cases to run through more instructions.

    Now testing includes instructions ANDI, CB, SB, AB, XOR,  INC, DEC, SLA, SRA, SRC, MOV, MOVB, SOCB, SZCB and X instruction comparisons in addition to the earlier tests for A, S, SOC, SZC, DIV, MPY, C, NEG and SRL instructions. 

    These were good additions, as I found two more bugs with CPU flags: the CB instruction did not set parity correctly, and the ABS instruction did not set overflow flag at all. I fixed those two, interestingly CB instruction sets parity according to source data byte instead of ALU subtract output, so that needed special casing. I suspect in the original TMS9900 there is only one parity generation circuit and it is sampled at a different time, I simply added a 2nd parity calculation.

    After fixing these two bugs now the problem I had has disappeared, so now PRINT 1*-1 returns -1. I suspect this must be the ABS bug fix that helped. 

    I guess these fixes mean I need more test cases, since I am sure there are more bugs.

  • Fixing divider and overflow flag issues

    Erik Piehl09/24/2018 at 18:21 1 comment

    After remembering where I was in the project I started to look for bugs in my CPU. I know it does not fully work, for example since running in TI BASIC I get:

    PRINT 1*-1
    1

    So something is not working in the CPU. In order to work on this, I took advantage of my previous design, which combines a real TMS99105 CPU with my FPGA implementation of the TI-99/4A. Running the above on it yields the correct result, -1.

    So I wrote a piece of test code, ran it both on the FPGA CPU and the TMS99105 while capturing the results (by dumping a section of memory on both systems to a file), below is the comparison:

    Left hand side is TMS99105, right hand side is my soft CPU, i.e. the FPGA TMS9900 core. Each instruction is tested 8 times with different data. The source code of this test is below, after the explanation below.


    Each instruction test output takes 8 bytes or 4 words. The last word of the output are the flags (only top 6 bits preserved). The result words are R1,R2,R3 and flags. The instruction is always executed like SUB R1,R2. Thus R2 is the result and R1 shows the source operand.

    With that, we can see that there is a difference with the second instruction under test. It is the subtract instruction.
    The first subtraction works fine (SUB 1,2, i.e. 2-1) but the second has a difference in the flags (SUB >7FFF,1) where the soft CPU has >2800 while the real CPU has >2000.
    The flag that is different is ST4 i.e. the overflow flag. Also in the the other SUB instructions the overflow flag is sometimes bogus, here is a table of the eight cases:
    SUB >1,>2           OK (note that with the TMS9900 this actually is the 2-1 operation)
    SUB >7FFF,1         Bug: soft-cpu asserts overflow
    SUB >8000,>7FFF     Bug: soft-cpu does not assert overflow
    SUB >7FFF,>8000     Bug: soft-cpu does not assert overflow
    SUB >FFFF,>8000     OK
    SUB >8000,>FFFF     Bug: soft-cpu asserts overflow
    SUB >8000,>8000     Bug: soft-cpu asserts overflow
    SUB >0,>8000        OK
    Looking at the data sheet carefully, there is a difference how ST4 (overflow) is asserted for adds and subs, the first condition is inverted. I bet I don’t do this.
    Adds: If MSB(SA) == MSB(DA) and MSB of result != MSB(DA)
    Subs: If MSB(SA) != MSB(DA) and MSB of result != MSB(DA)

    The only other difference with this data is at >0120, and here the result is wrong (but flags ok). Since R3 has changed, it must be a DIV or MPY instruction.
    First instruction test output at 0..>3F, 2nd at 40..7F, 3rd at 80..BF, 4th at C0..FF, 5th at 100..13F. So the fifth instruction.
    And indeed it is the DIV instruction - like PNR reported. One of my test cases at least now catches the problem. It is fifth test case, from above it is
    DIV >FFFF,>8000 i.e. >8000 divided by >FFFF. The result should be >8000 as quotient and >8000 as remainder. 
    But my code gives >FFFE as quotient and >FFFE as remainder too.

    Here is the TMS9900 assembler test code (story continues after the listing):

    ; EP 2018-09-23 - run through a sequence of instructions with data and  write
    ;    results to RAM. This is to enable comparing the FPGA CPU and TMS9900.
          LI    R5,>2000            ; point to result table
          LI    R7,TEST_ROUTINES    ; point to test routines
    RUN_TEST
          MOV    *R7+,R8                ; address of routine to test
          LI    R6,TEST_DATA_SEQ
          
    !
          MOV    *R6+,R1                ; fetch test parameters
          MOV    *R6+,R2
          CLR    R3
    ; perform operation under test      
          BL    *R8
    ; save results
          MOV    R1,*R5+
          MOV   R2,*R5+
          MOV   R3,*R5+
          STST    R3
          ANDI    R3,>FC00            ; only keep meaningful flags
          MOV   R3,*R5+          
          CI    R6,TEST_DEND
          JNE    -!      
          CI    R7,TEST_ROUT_END
          JNE    RUN_TEST
    ; write end marker to memory
          LI    R3,>1234
          MOV    R3,*R5+
          MOV    R3,*R5+
          MOV    R3,*R5+
          MOV    R3,*R5+
    
    And here is the data:
    TEST_DATA_SEQ            ; Parameters to pass two various instructions
        DATA    1,2            ; First data set
        DATA    >7FFF,1        ; 2nd
        DATA    >8000,>7FFF
        DATA     >7FFF,>8000
        DATA    >FFFF,>8000    ; 5th
        DATA    >8000,>FFFF
     DATA >8000,>8000...
    Read more »

  • Back with the project!

    Erik Piehl09/22/2018 at 20:19 2 comments

    I wrote the following as my comments to the GitHub commit I just made (formatted better here). I should additionally say that there are four branches at GitHub, the master branch and the soft-cpu-tms9902 branches are at the moment the ones I checked and/or worked with today.

    Commit 2018-09-22:

    • After a long while worked on the project. This was pretty much trying to remember where I was in the project.
    • I synthesized again the master branch and also worked on the soft-cpu-tms9902 branch. The master branch is the branch which supports the TMS99105 CPU on the daughterboard / shield that I designed two years ago. Still works.
    • There was some actual progress on the aforementioned soft-cpu-tms9902 branch. I clarified the naming and processing of reset signals. Thanks to this now two bugs are fixed:
      • Sound works (again?) now that the audio DAC is not constantly being reset.
      • The serloader component (handling communication from the host PC over USB serial port to the memory of the TMS9900 via DMA) was being reset by mistake while the CPU was placed to reset. Now if the host PC put the CPU to reset, that reset would also the serloader, effectively preventing any further communication with the system. This of course sucks big time, as the main use case for putting the CPU to reset in the first place is to load software to the memory of the TMS9900 system without having the CPU mess around with it while it was half loaded.
    • I know that my FPGA CPU core has bugs, and I found a repeatable one: running BASIC and doing a simple multiplication with PRINT 1*-1 yields always 1 (plus one) with my FPGA CPU, while an actual TI-99/4A or my TMS99105 FPGA system (i.e. the master branch using real CPU silicon) yields -1 as they should. So this bug can be observed with high level software... There we go. It is a miracle BASIC runs in the first place. The bug is probably related to CPU flag handling. I also have been reported by **pnr** that my divider implementation does not work properly in all cases, so need to check that too.


    Good to be back with the project!

  • Support for the original keyboard

    Erik Piehl01/02/2018 at 20:08 3 comments

    A quick addition of the day - this one was really easy to do as interfacing to a normal TI keyboard from the FPGA is way easier than communicating with the PC's keyboard through USB and the server process.

    The implementation quite literally only involved in bringing out the keyboard row / column wires from my TMS9901 interface chip implementation inside the FPGA. There are no external active or passive components other than the keyboard switches, thanks to the internal pull-ups of the FPGA.

  • Stand-alone booting capability

    Erik Piehl12/30/2017 at 19:11 0 comments

    An update after a long last!

    The next step for the design is to make the FPGA system stand-alone, i.e. able to boot and operate without a host PC. A USB connection will still be needed, but only to provide power. Today I implemented a new feature, where after reset the FPGA logic will load 256K of data from the SPI flash ROM to the SRAM of the system. That allows the system get the TI-99/4A system ROMs and GROMs to the static RAM in appropriate places. After the download one of the DIP switches controls the CPU's automatic boot - if switch zero is set the CPU in the FPGA will automatically boot and start executing the code that was transferred to SRAM.

    The 256K of data is divided into three regions:

    • First 128K is written to SRAM from address zero upwards. The logic of the FPGA maps this area to the cartridge ROM slot of the TI-99/4A. This is a paged are of 8K pages. By default my scripts but the extended Basic ROM code (16K there).
    • Next 64K are written to SRAM from address 0x80000 onwards (at address 512K). This is the area where GROM data is stored in my design. By default I have there first 24K of system GROM followed by 32K of Extended Basic GROM code.
    • The last 64K are written to SRAM at address 0xB0000. This is my ROM area. It is largely unused, but the first 8K (at address 0xB0000) are the disk support DSR space and another block of 8K (at address 0xBA000) is mapped to address zero of the TMS9900 core's address space, thus containing the normal console ROM code.

    The Pepino board has 1M of static RAM overall. I had forgotten that the board has actually 16 megabytes of SPI flash storage so there is plenty of potential here.

    The design of the SPI flash interface is from Magnus Karlsson, the designer of the Pepino FPGA board. I used the code from his Mac Plus example, and modified the code for my purposes. His code is written verily while my code is in VHDL, so I wrote the standard VHDL component header to enable me to interface the Verilog code from VHDL.

  • VDP character cell address masking feature

    Erik Piehl11/01/2017 at 21:20 0 comments

    I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.

    With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...

    The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:

    -- Graphics mode 2. 768 unique characters are possible.
    -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit
    masks for the two
    -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode.
    vram_out_addr <= reg4(2) -- MSB of the address
        & (char_addr(9 downto 8) and reg4(1 downto 0))  -- Character code with masks for bits 9 and 8
        & char_code & ypos(2 downto 0); -- 8 bit code and line in character

  • VDP character cell address masking feature

    Erik Piehl11/01/2017 at 21:20 0 comments

    I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.

    With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...

    The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:

    -- Graphics mode 2. 768 unique characters are possible.
    -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit
    masks for the two
    -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode.
    vram_out_addr <= reg4(2) -- MSB of the address
        & (char_addr(9 downto 8) and reg4(1 downto 0))  -- Character code with masks for bits 9 and 8
        & char_code & ypos(2 downto 0); -- 8 bit code and line in character

  • VDP character cell address masking feature

    Erik Piehl11/01/2017 at 21:20 0 comments

    I pushed to GitHub an update to my TMS9918 VHDL core, adding support for undocumented but somewhat widely used and known graphics mode 2 masking features. The lack of this feature was the culprit of making the megademo (see my previous update) not working properly in quite a few screens in a systematic way.

    With these fixes the megademo works much better, but there are still some problems (including the fact that the demo gets stuck at a certain point after running successfully through quite a few demo phases - the CPU core continues to run, but it appears to be in some kind of a loop that it cannot escape). So as always, fixing some bugs means its time to fix the next bugs...

    The character masking feature appears in two places in the VHDL code, using low bits of registers 4 and 3 as character cell masks, the example below illustrates the use of register 4 during character cell address calculation in graphics mode 2:

    -- Graphics mode 2. 768 unique characters are possible.
    -- Implement UNDOCUMENTED FEATURE: bits 1 and 0 of reg4 act as bit
    masks for the two
    -- MSBs of the 10 bit char code. This allows character set to be limited even in this mode.
    vram_out_addr <= reg4(2) -- MSB of the address
        & (char_addr(9 downto 8) and reg4(1 downto 0))  -- Character code with masks for bits 9 and 8
        & char_code & ypos(2 downto 0); -- 8 bit code and line in character

  • Bug fixes and support for 512K cartridges

    Erik Piehl10/09/2017 at 14:58 2 comments

    I did a couple of important bug fixes. I finally found, actually surprisingly quickly, the bug that caused the top pixel line to be shifted. The picture below illustrates this problem. The problem was not on the top line, it was that all the other scanlines of the picture that were right shifted by one pixel. This can be seen in the picture below, for example by looking at the top pixels of the M character on the topmost line.

    I also modified the right border start setting to properly display border colour in 40 column text mode. In that mode the picture is 240 pixels wide, not 256 pixels as in all the other modes. Not dealing with this properly caused the VGA scanline doubler to show pixels that were not written to during screen refresh.

    Then I changed the memory mapping, to support 512K cartridges. I did this by reallocating the 1MB external memory to Ti-99/4A mapping. Now 512K is allocated for paged cartridges (up from 64K). That came at the expense of reducing SAMS compatible memory to 256K. But importantly this allowed me to run the cool TI-99/4A megademo called "don't mess with Texas", and running that demo did reveal some bugs, below is the video.

  • Speed control needed - and added

    Erik Piehl09/20/2017 at 21:30 0 comments

    I wanted to continue my benchmarks and run my simple Basic program also under TI Extended Basic. That turned out to be impossible, as the keyboard repeat rate problem was much worse under extended basic than built-in Basic.

    It was time to do something about this. Instead of trying to hack the code (I tried quickly but too much code to disassemble and understand) it was time for a hardware solution. Execution speed on the TMS9900 is largely dependent on memory access speed. I added a 6-bit delay counter, which enabled me to add up to 63 wait states per memory access. The Pepino FPGA board has a 8 DIP switches, so I used three of those switches for determination of wait states (I did this with a clocked latch, so it is possible to adjust speed in flight):

    • DIP switch 1 on: 63 wait states
    • DIP switch 2 on: 31 wait states
    • DIP switch 3 on: 8 wait states
    • All off: no wait states

    Switch 1 has priority, so if it is on there will be 63 wait states. I also took a quick look at the CPU's memory timing under simulation: with no wait states reads take 40ns and writes 60ns, with 63 wait states reads take 670ns.

    Alas, it turned out that a 6-bit delay counter was too short, as I got these results when comparing execution speed under TI extended Basic for my test program:

    • Classic99 emulation: 1 min 11 s
    • 63 wait states: 24.7s, 2.9x faster
    • 31 wait states: 13.6s, 5.2x faster
    • 8 wait states: 5.6s, 12.7x faster
    • 0 wait states: 2.9s, 24.5x faster

    So even with the maximum of 63 wait states this thing goes too fast... Need to slow it down further. But not tonight. 

    Here is a video: