Close
0%
0%

Retro 68000 CPU in an FPGA

Making a Retrocomputer with a 68000 CPU in an FPGA

Similar projects worth following
This project builds off Grant Searle's Multicomp project to add a 68000 CPU and Tiny BASIC

I've wanted to build a 68000 system for a long time. I built some commercial systems back in the 1980's based on the 68000 and really like the CPU. 

There's a few 68000 builds on YouTube. The best documented 68000 build is Jeff Tranter's build of the TS2. Jeff has a very nice build BLOG and demonstration video on YouTube.

The TS2 is based on Alan Clements' book Microprocessor Systems Design: 68000 Family Hardware, Software, and Interfacing. The advantage starting from a published book is that the documentation of the design is really good.

There was a commercial design that is compatible, the Motorola MC68000 Educational Computer Board (MEX68KECB) (referred to as MECB). There is a TUTOR monitor, Tiny BASIC, Enhanced BASIC, and FORTH for the board.

The design doesn't fit into the minimal Multicomp FPGA board due to the memory requirements. It does fit into a larger FPGA like the Altera/Intel EP4CE15. I am using my RETRO-EP4CE15 card as the hardware platform.

The main features are:

  • M68000 CPU
  • Teesite TS2BUG (4KB) or MECB TUTOR (16KB) Monitor ROMs
  • 32KB Internal SRAM
  • ANSI Video Display Unit (VDU)
    • VGA and PS/2
  • 6850 ACIA UART
    • USB to Serial
  • USB powered

The board works and loads Gordo's MC68000 Tiny BASIC v1.2 as well as EnhBASIC.

  • SDRAM Working in a Different Context

    land-boards.com09/05/2020 at 12:28 0 comments

    There's an interesting project from 2014 called TG68 Experiments at Retro Ramblings (AMR) to create a 68000 CPU that uses the SDRAM on these inexpensive FPGA cards. The project starts with the 68K same core (TG68) I've been using by Tobias Gubener and builds on it. It improves on the TG68 SDRAM design considerably. Here's the GitHub repository for the project.

    This project has quite a few appealing features including a large frame buffer and the ability to load code from an SD Card. AMR has spent quite a bit of time building a fast SDRAM interface including a 2-way associative cache. He also does pixel dithering which I used to map the 16 bit graphics to the 6-bits of the RETRO-EP4CE15 card. 

    It took a solid day of fumbling around to port it to my RETRO-EP4CE15 card with a Cyclone V FPGA card but it's now running. In the end I routed some of the obvious lines (resets. clocks) out to the FPGA's extra I/O pins and used my logic analyzer and scope to debug the problems. The project is pretty well documented but the following are my notes from the port.

    Here's a picture of the screen on a VGA monitor.

    Fits in Cyclone V FPGA

    There's plenty of room on the Cyclone V FPGA. It should fit in a smaller FPGA just fine.

    Started with Cyclone III Example

    AMR chose a cheap Cyclone III card from flea bay. I chose to port the Cyclone III example thinking it might be closer to my Cyclone V FPGA card than the other board examples. That might not be the case, but it is what I did. The other boards had a lot more LEDs, switches, etc. The Cyclone III card had minimal resources.

    The card had two SDRAMs so it took some effort to remove support for the second SDRAM. In fact, that was the last domino to fall since I still had VHDL fragments from the 2nd PLL which I had only partially removed so it was causing the 68K to stay in reset.

    Serial Port Message

    On the way to booting, this status message was printed to the serial port (at 115,200 baud) (several reboots shown below):

    SD Card Contents

    AMR's TG68 Experiments needs an SD card with some specific contents, namely an S-Record file with the boot code. AMR bootstraps directly from the SD card so the bootstrap code needs to be in the root folder with a specific file name.

    If you want to try it out, just copy the file 
    CFirmware/out.srec to the root of the SD card, 
    and rename it to “boot.sre”. 

    That seems to be working since I see a window on the VGA screen with S-Records. 

    Here's one of the s-records from the VGA screen compared to the out.srec file contents. Not sure why there's less lines in the window but the one line does look right.

    I wonder if this shouldn't be different code, namely the sdbootstrap.srec file.

    I tried a few different sre files and they seemed to all work.

    Replaced the PLL with a Cyclone V PLL

    Quartus seems finicky with upgrading the PLL so I re-generated another one. The outputs are 100 MHz, 100 MHz, and 25 MHz in case someone tries to figure it out.

    How the Top Looks in Quartus

    This view of the hierarchy in Quartus might be helpful:

    Building the Code

    I copied the code to my Linux Virtual Machine (built earlier). However the code requires vasm which I don't have at the moemnt.

    My GitHub Repo

    The code for this project is in my Retro-Computers GitHub repository. The VHDL code for the build starts in here.

    End of TS2 Series?

    This could be the end of this TS2 series. I think I will shift over to getting the TG68 as implemented by AMR working.

    Postscript

    Rebuilt the EP4CE15 version of the card using the lessons learned from the Cyclone V build including the busstate signals and the External 1MB SRAM. Tested the build and it worked. Here in GitHub.

  • Built demo.c

    land-boards.com08/30/2020 at 14:12 0 comments

    Jeff Tranter has a piece of demo code that is written in C.  The code prints lines with number of 1 to 11 along with the number squared, the number to the 4th power and the factorial of the number

    Makefile Issues

    Jeff's Makefile has troubles with my version of the GCC compiler but all I needed to do to fix it was remove the include for the coff file. Doesn't seem to be a problem since obj-copy figures out the file type without it. I had the same issue with building assembly code.

    The Makefile has:

    all:    demo.run demo.s
    
    demo.run: demo.c
        /opt/m68k-elf/bin/m68k-elf-gcc -Wall -m68000 -msoft-float -c demo.c
        /opt/m68k-elf/bin/m68k-elf-ld --defsym=_start=main -Ttext=0x2000 -Tdata=0x3000 -Tbss=0x4000 --section-start=.rodata=0x5000 demo.o `/opt/m68k-elf/bin/m68k-elf-gcc -m68000 -print-libgcc-file-name`
        /opt/m68k-elf/bin/m68k-elf-objcopy -O srec a.out demo.run
    
    demo.s: demo.c
        /opt/m68k-elf/bin/m68k-elf-gcc -Wall -nostdlib -nodefaultlibs -m68000 -S demo.c
    
    clean:
        $(RM) a.out demo.o demo.run demo.s    

    Running the Program

    The program starts running at address 0x2000 and does the following:

    TUTOR  1.3 > GO 2000
    PHYSICAL ADDRESS=00002000
    Start
    n  n^2  n^4  n!
    1 1 1 1
    2 4 8 2
    3 9 27 6
    4 16 64 24
    5 25 125 120
    6 36 216 720
    7 49 343 5040
    8 64 512 40320
    9 81 729 362880
    10 100 1000 3628800
    11 121 1331 39916800
    Done
    
    TUTOR  1.3 >
    

    Useful Functions in example code

    Jeff provides two functions which use the TUTOR monitor for calls:

    void outch(const char c);
    void printString(const char *s);
    

    The code for these programs is: 

    // Print a character using the TUTOR monitor trap function.
    void outch(char c) {
        asm("movem.l %d0/%d1/%a0,-(%sp)\n\t"  // Save modified registers
            "move.b %d0,%d0\n\t"              // Put character in D0
            "move.b #248,%d7\n\t"             // OUTCH trap function code
            "trap #14\n\t"                    // Call TUTOR function
            "movem.l (%sp)+,%d0/%d1/%a0");    // Restore registers
    }
    
    // Print a string.
    void printString(const char *s) {
        while (*s != 0) {
            outch(*s);
            s++;
        }
    }
    

    The outch( ) routine is conveniently written in assembly so it provides a nice example of in-line assembly language code. 

    Return to TUTOR

    main( ) cleanly returns to the Tutor monitor. It does that by using a TRAP:

    // Go to the TUTOR monitor using trap 14 function. Does not return.
    void tutor() {
        asm("move.b #228,%d7\n\t"
            "trap #14");
    }
    

    The trap 14 handler code in TUTOR header has:

    8162                   *-------------------------------------------------------------------------
     8163                   * File TRAP14    Trap 14 handler of "TUTOR"                       06/25/82
     8164                   
     8165                   *        CALLING SEQUENCE
     8166                   *                  %D7 = XXXXXXFF   WHERE "FF" IF FUNCTION NUMBER
     8167                   *                  TRAP      #14
     8168                   
     8169                   TRAP14:
     8170 be70 48E7 4160              MOVEM.L %D1/%D7/%A1-%A2,-(%A7)
    

    This finally validates the GCC toolchain works.

  • Enhanced BASIC

    land-boards.com08/30/2020 at 12:02 0 comments

    Jeff Tranter also ran the Enhanced BASIC on his TS2 build. It loads into ROM on his card from 0xC000-0xCFFF. This card is compatible with Jeff's design (it's based on Jeff's design which is a copy of the Teeside TS2 board so any software that Jeff got running should run on this card.

    I added a 16KB internal SRAM to that range and uploaded Enhanced BASIC to the card and it worked.

    A short program...

    I could change the SRAM into ROM and have BASIC always ready to run.

  • Cleaning up a noisy counter

    land-boards.com08/29/2020 at 13:57 0 comments

    Using a binary counter for the wait states is not a good idea. The reason is that counters when decoded have noise when they transition. That's because they transition more than one bit at a time.

    The classical solution to this problem is to use a grey counter which only changes one bit at a time. It does count strangely when viewed as numbers but it's much easier to decode and it is glitch-free.

    Here's a 4 bit grey count:

    0000 -> 0001
    0001 -> 0011
    0011 -> 0010
    0010 -> 0110
    0110 -> 0111
    0111 -> 1111
    ...

    Here is the logic analyzer capture:

    Unfortunately the flakey S record load at 25 MHz happened again. Dropping back to 16.7 MHz fixed it again. Still haven't fixed the edge condition....

    Timing to access external SRAM.

  • Improving CPU issues

    land-boards.com08/29/2020 at 13:31 0 comments

    I was having intermittent CPU issues. These manifested as intermittent results uploading S-Records. Sometimes I'd even make an identical build which would work one time but not the next. Something was marginal in the design.

    I hooked up a logic analyzer and added test points to the I/O connector to monitor some of the internal control lines of the CPU.

        IO_PIN(48) <= w_cpuClock;
        IO_PIN(47) <= n_WR;
        IO_PIN(46) <= w_nLDS;
        IO_PIN(45) <= w_nUDS;
        IO_PIN(44) <= n_externalRam1CS;
        IO_PIN(43) <= w_wait_cnt(3);
        IO_PIN(42) <= w_n_RomCS;
        IO_PIN(41) <= w_n_RamCS;
        IO_PIN(40) <= w_busstate(0);
        IO_PIN(39) <= w_busstate(1);
        IO_PIN(38) <= cpuAddress(15);
        IO_PIN(37) <= '0' when ((cpuAddress(23 downto 3) =  x"00000"&'0'))    else        -- X000000-X000007 (VECTORS)
                            '1';
    

    The 68K CPU has two bus state lines which indicate the operation being performed. They are documented as follows:

    busstate : out std_logic_vector(1 downto 0);
    -- 00 -> fetch code 
    -- 10 -> read data 
    -- 11 -> write data 
    -- 01 -> no memaccess

    Disabling accesses for the situation where the busstate = 01 cleaned up the issues I was having. Also, making peripheral accesses only active when busstate(1) = 1 protects the port a bit. RAM and ROM can have either code or data so they need to access for the situation where busstate{1) = 1 or busstate(0) = 0.

     Here's the timing of the CPU coming out of reset:

    This fixed the CPU so that it runs reliably at 25 MHz including downloading S-Records and running code. It broke the External SRAM but that's OK since I wanted to work on the controller anyway. 

    The VHDL code is up on the GitHub.

  • Another GCC 68K Cross Compiler

    land-boards.com08/27/2020 at 10:16 0 comments

    I've loved the (on-line GodBolt) Compiler Explorer project for a while now. It lets you type code in one window and see it compiled to assembly language in another window. 

    There's a Compiler Explorer site which does 68k cross compiling. This also lets you play with compiler options like optimization. One thing that you learn doing embedded system software is the C language keyword volatile. Any hardware register which is updated externally to the 68K needs to have volatile added. For instance, the ports for the VDU and ACIA can be accessed as pointers with the following defines:

    #define ACIASTAT    (volatile unsigned char *) 0x010041
    #define ACIADATA    (volatile unsigned char *) 0x010043
    #define VDUSTAT     (volatile unsigned char *) 0x010040
    #define VDUDATA     (volatile unsigned char *) 0x010042
    #define TXRDYBIT 0x2
    

    To print a character to the ACIA:

    void printCharToACIA(unsigned char charToPrint)
    {
        while ((*ACIASTAT & TXRDYBIT) == 0x0);
        * ACIADATA = charToPrint;
    }
    

     The 68K Compiler Explorer looks like:

    A very nice and fast way to see what the compiler does. Setting the -O3 flag shows what the optimizer does to the code:

    Nice job of optimization. If you click the green check box you can see the compiler options:

    -g -o /tmp/compiler-explorer-compiler120727-1067-xxj1yf.t0dk/output.s 
    -S -fdiagnostics-color=always -O3 /tmp/compiler-explorer-compiler120727-1067-xxj1yf.t0dk/example.cpp

  • GCC Cross Assembly Toolchain Workflow Video

    land-boards.com08/22/2020 at 18:47 0 comments

    A short video which shows the GCC Toolchain workflow.

    The toolchain isn't making an S9 record so it's not terminating the load. Patching one at the end of the code works.

    S9030000FC
    

  • Testing the External SRAM

    land-boards.com08/22/2020 at 12:55 0 comments

    Now that we have a working GCC toolchain, let's write a program to test the External SRAM.  Need to use the patch that fixes the timeout for srecord loading. Using this patched version of the srecord file.

    I'm using a GitHub repo to transfer data back and forth to my PC.

    Also, upgraded to Quartus version 20.1. It is very slow.

    Test of External SRAM passes:

    * Test External SRAM
    * External SRAM on the RETRO-EP4CE15 card goes from 0x300000 to 0x3FFFFF (1 MB)
    * External SRAM only supports 8-bit accesses
    * TUTOR14 uses SRAM from 0x000000 to 0x000800
    
    RAMSTART    = 0x300000
    RAMEND      = 0x3FFFFF
    ACIASTAT    = 0x010041
    ACIADATA    = 0x010043
    
    * Code follows
    
        .ORG    0x001000
    * CHECK FIRST LOCATION BY WRITING/READING 0x55/0xAA
    STARTTEST:
        MOVE.L  #RAMSTART,%A0
        MOVE.B  #0x55,%D0
        MOVE.B  %D0,(%A0)
        NOP
        MOVE.B  (%A0),%D1
        CMP.B   %D0,%D1
        BNE     FAIL
        MOVE.B  #0xAA,%D0
        MOVE.B  %D0,(%A0)
        NOP
        MOVE.B  (%A0),%D1
        CMP.B   %D0,%D1
        BNE     FAIL
    * WRITE INCREMENTING PATTERN
        MOVE.B  #0X00,%D0
        MOVE.L  #RAMSTART,%A0
        MOVE.L  #RAMEND+1,%A1
    CHKBLKS:
        MOVE.B  %D0,(%A0)+
        CMP.L   %A0,%A1
        BEQ     DONEFILL
        ADDI.B  #0x01,%D0
        BRA     CHKBLKS
    DONEFILL:
    * READ BACK INCREMENTING PATTERN 
        MOVE.B  #0X00,%D0
        MOVE.L  #RAMSTART,%A0
        MOVE.L  #RAMEND+1,%A1
    LOOPCHK:
        MOVE.B  (%A0)+,%D1
        CMP.B   %D0,%D1
        BNE     FAIL
        CMP.L   %A0,%A1
        BEQ     DONECHK
        ADDI.B  #0x01,%D0
        BRA     LOOPCHK
    DONECHK:
    * PRINT 'Pass'
        MOVE.B  #0x0A,%D0
        JSR     OUTCHAR
        MOVE.B  #0x0D,%D0
        JSR     OUTCHAR
        MOVE.B  #'P',%D0
        JSR     OUTCHAR
        MOVE.B  #'a',%D0
        JSR     OUTCHAR
        MOVE.B  #'s',%D0
        JSR     OUTCHAR
        MOVE.B  #'s',%D0
        JSR     OUTCHAR
        RTS
    FAIL:
    * PRINT 'Fail'
        MOVE.B  #0x0A,%D0
        JSR     OUTCHAR
        MOVE.B  #0x0D,%D0
        JSR     OUTCHAR
        MOVE.B  #'F',%D0
        JSR     OUTCHAR
        MOVE.B  #'a',%D0
        JSR     OUTCHAR
        MOVE.B  #'i',%D0
        JSR     OUTCHAR
        MOVE.B  #'l',%D0
        JSR     OUTCHAR
        RTS
    
    * OUTPUT A CHARACTER IN D0 TO THE ACIA
    OUTCHAR:
        BSR     WAITRDY
        LEA     ACIADATA,%A1
        MOVE.B  %D0,(%A1)
        RTS
    
    * WAIT FOR THE SERIAL PORT TO BE READY
    WAITRDY:
        LEA     ACIASTAT,%A1
    LOOPRDY:
        MOVE.B  (%A1),%D1
        ANDI.B  #0x2,%D1
        BEQ     LOOPRDY
        RTS
    

    Made significant improvements to above code and checked it into GitHub here.

  • Speeding up the CPU

    land-boards.com08/22/2020 at 10:00 0 comments

    The 68000 CPU IC had a pin called DTACK* (Device Transfer Acknowledge). When you grounded DTACK* the CPU ran at full speed.  If you wanted to slow down the CPU for a slower external device you pulled the pin high until the device finished.

    The FPGA Core for the 68000 CPU has a similar pin "clkena_in" which flips the sense of DTACK* and "stretches" the clock when it is low and enables the CPU clock when high. The pin was set to high in my design and there were no wait states. This didn't work for External SRAM so I lowered the clock speed to 16.7 MHz which allowed the CPU to access slower External SRAM correctly.

    Adding Wait States to Speed up the CPU

    I added a wait state counter for the clkena_in signal which is activated when the CPU tries to access External SRAM. This will come in handy if I want to get the external SDRAM working. Here's the code for the wait state counter plus part of the CPU instance..

        -- Wait states for external SRAM
        w_cpuclken <=     '1' when n_externalRam1CS = '1' else
                '1' when ((n_externalRam1CS = '0') and (w_wait_cnt >= "0100")) else
                '0';
                            
        -- Wait states for external SRAM
        process (i_CLOCK_50,n_externalRam1CS)
            begin
                if rising_edge(i_CLOCK_50) then
                  if n_externalRam1CS = '0' then
                      w_wait_cnt <= w_wait_cnt + 1;
                  else
                        w_wait_cnt <= "0000";
                  end if;
                end if;
            end process;
        
        CPU68K : entity work.TG68KdotC_Kernel
            port map (
                clk                => w_cpuClock,
                nReset            => w_resetLow,
                clkena_in        => w_cpuclken,
    

     As a result I was able to move the CPU speed back to 25 MHz with no wait state for any other accesses.

    The External SRAM chip select is now 120 nS (3 CPU clocks). It could easily be made shorter since my External SRAM is 45 nS parts. Would still need additional time for propagation delays and setup times so I will leave it as is for the moment..

  • Improvements for the Cyclone V FPGA

    land-boards.com08/21/2020 at 20:37 0 comments

    The Cyclone V FPGA (PN: 5CEFA2F23) has a lot more internal SRAM than the EP4CE15. I was able to add 96KB of internal SRAM. Due to the TS2 memory map the new memory could not be contiguous with the lower RAM. That's because the Tutor ROM is located at 0x008000-0x00FFFF. The new internal SRAM is at 0x200000-0x217FFF.

    I also got the external SRAM working - well sorta working. It is only 8-bits and the 68000 CPU does not do dynamic bus sizing so it has to be accessed as bytes. But I tested some locations and they worked fine. The External SRAM is from 0x300000-0x3FFFFF.

View all 24 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates