Close

Success! FPGA based TI-99/4A working!

A project log for TMS9900 compatible CPU core in VHDL

Retro challenge 2017/04 project to create a TMS9900 compatible CPU core. Again in a month... Failure could be an option...

erik-piehlErik Piehl 09/17/2017 at 07:411 Comment

Finally I got my TMS9900 CPU to work enough that I can run original TI-99/4A software on my FPGA based TI-99/4A clone. Below you can find a link to my quick-and-dirty but rather long video about the whole project.

Prior to this last working session I knew that I still needed to implement the divide instruction, so I went about doing it. I did that by first writing a very simple C program, and then converted that functionality to VHDL.

unsigned short tms9900_div(unsigned int divident, int divisor) {
    unsigned short sa;      // source argument
    unsigned short da0;     // destination argument (high 16 bits)
    unsigned short da1;     // destination argument (low 16 bits);
    printf("divident: %d divisor: %d\n", divident, divisor);
    // algorithm
    da0 = (divident >> 16);
    da1 = divident & 0xFFFF;
    sa = divisor;
    
    int st4;
    if (
        (((sa & 0x8000) == 0 && (da0 & 0x8000) == 0x8000))
        || ((sa & 0x8000) == (da0 & 0x8000) && (((da0 - sa) & 0x8000) == 0))
        ) {
        st4 = 1;
    } else {
        st4 = 0;
        // actual division loop, here sa is known to be larger than da0.
        for(int i=0; i<16; i++) {
            da0 = (da0 << 1) | ((da1 >> 15) & 1);
            da1 <<= 1;
            if(da0 >= sa) {
                da0 -= sa;
                da1 |= 1;   // successful substraction
            }
        }
    }
    printf("quotiotent: %d remainder %d st4=%d\n", da1, da0, st4);
    printf("checking: quotiotent %d remainder %d\n\n", divident/divisor, divident % divisor);
    return da1;
}

Getting this algorithm implementation to work took something like 15 minutes, so this was quickly done. Also the VHDL implementation did not take long, although I did manage to bring a few bugs. I had been delaying a little the implementation of the divide instruction since I thought it would take a long time, but actually that was quickly done.

After implementing the divide instruction it was not smooth sailing yet, since  keyboard was not working properly. I traced the problem to the CRU interface (LDCR and STCR) instructions. STCR which reads from the external CRU and writes to a destination, returned bit shifted data. As an example, the expected value for button '1' in my test program would have been >FEFF, but the read data was >FDFF, so there was a shift of one bit. I did run multiple simulation runs with my VHDL test bed, but it always worked. Finally after some head scratching this turned out to be a major timing error: the STCR instruction presented the address to read from on the first cycle, and already on the 2nd cycle following it (i.e. 10ns later) it was latching the data. Inside my FPGA TI-99/4A implementation that was way too fast, so I added a two clock cycle delay before sampling the CRUIN pin - and voila, my TI-99/4A clone was running!

The performance however turned out to be slower than expected: it only runs 15 times faster than the original TI, despite a 30 fold difference in clock speed (3.3MHz vs 100MHz). When I was creating the TMS9900 core my first priority was to get the bloody thing running, so I did not pay much attention to how many states each instruction has to flow through to implement it's task. I do like to optimise though, and now that my TI clone is working, I can turn my attention to make it running even faster :)

Source code can be found here:

Link to GitHub (the FPGA CPU is in the soft CPU branch).

And here is the video talking about the project a bit:

Youtube link

Discussions

Barry Nelson wrote 09/18/2017 at 22:27 point

I agree, I think it will be very important to be able to run existing software at the original speed.

  Are you sure? yes | no