Suite-16 is a 16-bit cpu built entirely from TTL. It is a personal exploration of how hardware and software interact.

Similar projects worth following
Suite-16 is an experimental 16-bit TTL cpu designed to explore cpu architectures and the interactions between hardware, firmware and software needed to make a functioning computer.

The name Suite-16 is a word play on Steve Wozniak's 16-bit virtual cpu "Sweet-16" written in 6502 assembly language to augment the Apple II when performing 16-bit operations.

It featured in BYTE Magazine of November 1977 from Page 150 onwards.

It is also well documented here:


Suite-16 arose from my curiosity regarding how computers work.

I have had a lifetime in hardware engineering but very little real exposure to firmware and software since my Z80 coding days in the 1980s.

The Suite-16 project will encourage me to explore the interactions of hardware, firmware and software - hopefully widening my horizons, learning new skills and achieving the goal of building a working TTL computer system.

Suite-16 refers literally to the suite of deliverables needed to put together a working 16-bit computer system.  

The main deliverable will be a 16-bit computer system designed and built from 7400 series TTL integrated circuits outlined in the next section  below.

From a hardware point of view this will include circuit schematics, pcb layouts, timing diagrams and test-bench programs. It will also likely contain FPGA prototypes with their supporting designs coded in verilog. Part of the project aim is to become familiar with FPGA design and verilog programming - and what better way than to set the goal of a working computer system.

As any computer is the co-ordinated interaction between hardware and software - there will be several key software deliverables needed to complete the design.  These will include the Instruction Set - Architecture or ISA, an assembler and a software simulator of the proposed machine. Later on, when the machine is debugged and can run simple code - there will be applications to be written such as high level languages.

You can find my Github repository here: Suite-16  This is updated regularly to include all the latest Arduino code and sketches.

In summary, Suite-16 is an exploration of the knowledge space bounded by hardware, software and FPGA design.

Main Features

Principle Technology:  5V TTL logic with conventional through-hole DIL packages wherever possible

Data bus:  16-bits

Address bus: 16-bits - with possible extension to 24-bits

Architecture:   Von Neuman  Accumulator-Register

Hardwired Logic - no microcode

Basic 2 stage pipeline allowing most instructions to complete in 1 clock cycle

Registers:   16 general purpose registers R0 to R15.  R0 is the accumulator, most ALU operations are on Accumulator and a Register

Additional Registers:   Program Counter PC, Instruction Register IR

Number of instructions:  31


Memory and Register Instructions: SET LD ST LD@ ST@  POP PUSH

Program Flow:  CALL RET JMP BRA  and conditional branching BGT BLT BNE BEQ BGE BLE

Other Instructions:  IN OUT SRA SLA  CLA STC CLC

Addressing Modes:  Immediate #N, Register Direct Rn, Register Indirect @Rn, Indirect Autoincrement @Rn+,  Indexed Mode (Rn+X), Symbolic Mode (PC +X)

Instruction Length:  16-bits

Instruction Format:  8-bit bytecode in IR 15:8 augmented by 8-bit payload in IR 7:0

Hardware:  ALU based on 4-bit bitslice design.  ALU + PC plus register file on one pcb. Memory and mass-storage on 2nd pcb.

Performance:   Aiming for 12.5MHz clock frequency to have some interoperability with VGA video systems.

Target Use:  Co-processor / Accelerator for accompanying TTL colour computer  (Gigatron)

Tools and Resources.

Throughout this project I will need to learn new techniques and use new tools.

Hardware simulation will be done using H. Neeman's "Digital"  simulator - which is an updated version of Logisim

Most of my C code will be run on Arduino compatible platforms - so that it can easily be explored.

For hardware schematics and pcb layout, I generally use EagleCAD 6.x - but I am slowly moving towards open source tools including KiCAD and the online pcb design package from easyEDA

Sometimes we have to resort to FPGAs and a suitable hardware description...

Read more »

  • SIMPL Revisited

    monsonite01/21/2020 at 20:20 2 comments

    Back in late October, just as I was starting to hand-code some assembly routines for Suite-16, I considered porting my bytecode interpreter SIMPL across from MSP430 assembly language to that of the Suite-16.

    Unfortunately at that time, the instruction set was very much in a state of flux and still evolving, and hand assembly was somewhat time-consuming. Revisiting this task now that we have an assembler and a hexloader in our toolchain armoury makes the job so much easier, and I have transcribed the SIMPL framework in a long afternoon sprint of about 6 hours coding. As well as the code, it is highly commented, thus documenting the workings of the SIMPL interpreter as I went along.

    What is SIMPL?  

    At it's most basic level it is an interpreter consisting of a switch statement contained in a loop. 

    After all - that is how most simple processors and virtual machines are simulated. To make the job easier in assembly language - the switch statement is replaced with a jump table, with one 16-bit entry for each of the 96 printable ascii characters.  In essence, when a command character is read from the input buffer, it is used to index into the jump table and pick up a 16-bit address for the code-block associated with that command.

    SIMPL is stack based, so numbers are put onto the stack and operated on from there.  

    It is a tiny-Forth-like utility language without the complexity of the dictionary and text string matching that is needed in a full-blown Forth.

    So the commands include familiar mathematical and logical symbols such as  + - * and  /, which obviously relate to arithmetical operations.

    Then there are the stack operations  - familiar to those using Forth, such as DUP, DROP, SWAP and OVER. In addition there are  commands associated with decimal and hexadecimal number entry and also output routines to a serial terminal for decimal, hexadecimal and hex-dump formats.

    The user has 26 commands available - that can be user defined and customised. These are represented by the uppercase characters A to Z.  The language is extensible just by assigning a user command to a snippet of instructions.

    SIMPL provides a minimal framework, consisting of serial input and output, number conversion, text interpreter, command allocator plus a range of arithmetical, logical, stack and memory operations. From this collection of built in routines, elementary applications can be assembled. 

    For example, if you were developing an application for a hex monitor, to view the contents of memory on screen in hex, edit it and save it back to memory you would assign characters for all the commands that you would need. 

    A     set the Address    

    C     Clear memory

    D     Dump memory

    E      Execute the routine

    Alternatively if you were developing an application to control a CNC machine, plotter, 3D printer, robot etc you would probably use the X, Y, and Z characters that define the co-ordinates you wish the machine head to move.  The SIMPL interpreter can read these in one line at a time from an ascii text file to control the motion of the machine, or to draw graphics on a screen.  This is similar to the manner in which G-Code is used to control CNC machine tools, or Gerber files are used to control a photo plotter for pcb manufacturing.  These simple control tasks evolved in the 1960s, when processing power was very limited, so the application itself had to be kept non-complicated.  

    SIMPL is very compact.

    The minimal interpreter framework with all the input and output utility routines is under 256 program words. The Jump table is a further 96 program words. To this you must add the action routines associated with all of the potential 96 commands - but these are often very small - each only a few instructions long.  

    SIMPL is easy to tailor to your own application, as rudimentary or complex...

    Read more »

  • A New Year - and 2020 Vision.

    monsonite01/13/2020 at 16:20 2 comments

    In mid-November, I traveled out to Northern California, to attend Forth Day at Stanford University - and to meet one of my Computing Heroes, Charles H. Moore - the inventor of Forth. Returning in late November, I got stuck in a bit of a rut, plus a lot of other things were putting pressure on my free time - which meant that the Suite-16 project was put on the back-burner with no further progress. My project-colleague, Frank, is currently touring Australia and Tasmania for the duration of January, and so that gives me a two-week  window of opportunity to take stock of the project so far and plan out the goals for 2020.

    It was always my intention that Suite-16 would exist as a C-simulator, a computer implemented in standard TTL and as a verilog implementation to run on an FPGA as a soft-cpu.  In discussions with Frank, we have decided that now we have a working simulator, the next step is to convert this into a soft-cpu developed and running on opensource FPGA hardware. When this task is done, I will have had a lot more experience writing in verilog, and Frank and I, will have stable target hardware platforms with which we can do real system development.  The TTL processor will follow on later, as it will be a lot of hard work, but at least the architecture will have been thoroughly explored and documented by that time.

    The FPGA family I have chosen to use is the Lattice ICE 40 series.  I first encountered these in early 2015 when Clifford Wolf announced his "Project IceStorm"  reverse engineered, opensource FPGA toolchain - allowing the Lattice parts to be programmed using a manageable, open source toolchain.  One of the first soft-cpus to benefit from this announcement was James Bowman's J1 Forth CPU. Video here:  James had previously used Xilinx parts for earlier implementations of his J1, but by May 2015 had proven that it would easily fit onto a ICE 40 HX1K - which is a 1K lUT FPGA used on the $25 Lattice IceStick development board. James and I have collaborated on a couple of projects, having first met at the Open Hardware Summit in New York in September 2011. 

    In May 2016 a friend, Alan Wood,  and I decided to develop an opensource FPGA dev-board based on the Lattice ICE40 HX4K.  The result was the myStorm BlackIce, which is now into it's fifth generation. Hardware development took about 14 weeks from idea to the first 250 boards arriving from the manufacturers in Shenzhen, China. We debuted at  the 2016 OSHCamp (Open Source Hardware Camp) held annually in Hebden Bridge, West Yorkshire, UK.

    In November 2016, I met up with James again at  Forth Day in Palo Alto, where he had implemented his J1 cpu on one of our BlackIce boards, and was serving his presentation as a series of jpgs from it.  The BlackIce hardware had been fully exercised and proven robust and reliable - and makes an ideal platform for soft cpus or SoCs.

    The 2nd generation BlackIce II board has a Lattice ICE 40 HX4K FPGA and 256K words of 16-bit, 10nS SRAM.  Programming (configuration) is done over a USB connection, using an STM32L433 ARM Cortex M4 to act as the programming interface.  We could have just used a FTDI FT2232H as a USB to SPI converter (like everyone else does) - but frankly at $3.50 the FTDI device is too expensive and the STM32 offers so much more flexibility for less money.  Once the FPGA is programmed, the STM32 can be used to provide slave peripherals for the FPGA (such as ADC, DAC, UART, I2C, SPI etc).   Later generation of myStorm boards have all adopted this ARM/FPGA symbiosis.

    Implementing a soft-cpu in verilog

    verilog, like vhdl is a popular hardware description language or HDL.  verilog has its roots in C, whilst vhdl arose out of the US department of defense and is structured like Ada.  Both are equally used, but verilog has my vote, because I know nothing about vhdl syntax.

    When James Bowman transcribed...

    Read more »


    monsonite11/05/2019 at 22:50 0 comments

    PRINTHEX is probably the last of the utility routines that I needed to write in order to get a simple hex loader to run.

    It accepts a 16-bit integer value from the accumulator R0 and prints it out as a 4-digit hexadecimal number. Leading zeros are not suppressed.

    It is based on the decimal number print routine PRINTNUM, but with the added complication that the hex character sequence is not contiguous in the ascii table.

    This is likely to be the last of the hand-assembled routines, because my motivation is from now on  to use the TASM32 assembler - kindly customised for the Suite-16 instruction set by Frank Eggink.

    Having a working hex loader with hex dump and simple monitor commands will be the next goal!

    Here's just the PRINTHEX assembly code - it fits nicely into 48 words of memory:

    // 0x0070 ---------------------------PRINTHEX----------------------------------------- 
        //  Prints out contents of R0 as a 4 digit hexadecimal number to the terminal 
        //  Leading zeroes are not suppressed
        //  R1 = Heximation Value
        //  R2 = digit 
        //  R3 = 0x30
        //  R4 = temporary storage for accumulator (Heximated value)
        //  R6 = temporary store for output character
            0x1200,     // SET R2, 0x0000
            0x1300,     // SET R3, 0x0030
            0x1100,     // R1 = 4096
            0x088C,     // CALL Heximate
            0x1100,     // R1 = 256
            0x088C,     // CALL Heximate
            0x1100,     // R1 = 16
            0x088C,     // CALL Heximate
            0x0A30,     // ADI 0x30 to make a number
            0x3600,     // Store in R6
            0x0B3A,     // SBI  0x3A  - is it bigger than ascii 9
         // 0x0080 ---------------------------------------------------------
            0x0284,     // BLT 0x84 -  Print decimal digit
            0x0A41,     // ADDI 0x41 - make it a hex digit 
            0x0C00,     // putchar R0
            0x0086,     // BRA CRLF
            0x2600,     // LD from R6
            0x0C00,     // putchar R0
            0x1000,     // SET R0, CR
            0x0C00,     // putchar R0, CR
            0x0B03,     // SBI 0x03 Set R0, LF   
            0x0C00,     // putchar R0, LF
            0x0003,     // BRA START     
         // 0x008C ------------------------Heximate--------------------------------
            0xB100,     // SUB R1,     :Heximate 
            0x0290,     // BLT 0x90    
            0xE200,     // INC R2
            0x008C,     // BRA 0x08C
         // 0x0090 ---------------------------------------------------------    
            0x3400,     // Store R0 in R4   temporary store the remainder
            0x2200,     // MOV R2, R0  get the count from R2
            0x0A30,     // ADI 0x30 to make a number
            0x3600,     // ST R0, R6  - temporary save to R6
            0x0B3A,     // SBI  0x3A  - is it bigger than ascii 9
            0x0299,     // BLT 0x99 Print decimal digit
            0x0A41,     // ADI 0x41 - make it a hex digit 
            0x0C00,     // putchar R0
            0x009B,     // BRA 0x9B   Restore R0
            0x2600,     // Get R0 back from R6
            0x0C00,     // putchar R0 Print it as a decimal digit
            0x2400,     // Get R0 back from R4
            0xA100,     // ADD R1 adds DEC value to restore R0       
            0x1200,     // SET R2,0    Reset R2
            0x0900,     // RET

  • Finalising the Instruction Set

    monsonite11/04/2019 at 13:12 1 comment

    Recently, I have been exploring the Suite-16 instruction set, by the practical method of writing assembly language to run on the Suite-16 simulator.

    Starting with a very simple routine to output "Hello World!", I have created routines for decimal and hexadecimal entry, decimal output and a very simple command interpreter.

    In writing these routines, it became clear that there were certain deficiencies in the instruction set and over the last 2 weeks I have slowly added useful instructions to make the cpu more versatile.

    This process is now approaching it's logical conclusion - partly because I have run out of spare instruction slots, and partly because I don't want to create such a complicated instruction set - that I don't stand a chance of implementing it in hardware. 

    Also I feel that after 2 weeks of spare time writing assembly routines it's time to move the project along to its next phase and begin the hardware implementation.

    There are three main areas in which I believe the instruction set can be augmented.  

    The first is making more use of 16-bit immediate operations on the accumulator R0. To the ADI and SBI operations are 8-bit immediate operations where the operand is held in the lower 8 bits of the instruction register.  Extending this to 16-bit will mean that the operand will be held in the next location in memory.  This could be done by making the program counter another general purpose register - and this I believe is how the MSP430 implements immediate operations.

    With this mechanism in place, ADD, SUB, AND, OR and XOR would benefit from having this 16-bit immediate mode.

    My dealings with the decimal and hex routines have also highlighted the need for an efficient left shift on the accumulator.  

    Ideally I can implement as a bare minimum an ADD R0, R0, which will at least allow a doubling of the accumulator without involving any other register.  The "Times 10" and "Times 16" routines used in decimal and hexadecimal entry would benefit from this instruction saving a few instruction cycles.

    Secondly,  I'm proposing that the 8-bit payload in the lower 8 bits of the instruction register can be used as an address to allow the IN and OUT operations to address up to 256 I/O devices.  This is a placeholder for memory mapped I/O peripherals such as UARTs, timers and GPIO which can be added later.

    Thirdly, the final instruction slot 0x0Fxx which is currently used as NOP. I intend to extend this to allow for microinstructions - inspired by the OPR, "OPeRate" instructions used on the PDP-8. 

    Plagiarising the PDP-8

    The OPR instructions allow operations such as clearing and complementing the accumulator, setting and clearing the carry bit and shift and SWAP operations to be implemented.

    The PDP-8 OPR instructions were implemented with the following individual bit-lines that operated directly on the hardware:

    This scheme gives access to 8 individual control lines which could be sequenced to become active in a specific timeslot which allowed quite complex operations to be performed on the accumulator.

    An alternative scheme is possible, where the lower 8-bit payload are fully decoded to allow up to 256 microinstructions.  For maximum flexibility this could be done by using the byte to address a micro instruction ROM such as an additional  AT27C1024.  A 4-bit counter can be used on higher address lines to provide a primitive 16 step microsequencer.  That leaves 4 address lines which could be used as inputs to implement a simple external interrupt system.

    This would be very flexible but requiring more hardware, and probably quite limited by the access time of the AT27C1024 ROM.

    The next plog (project log) will start to look at the hardware architecture and how we might implement a fast microinstruction sequencer using a counter, a diode matrix and some 3-8 line decoders.

  • Hexadecimal Entry

    monsonite11/02/2019 at 20:29 0 comments

    It's been a bit of a slow week, and I must admit that I lost focus in the middle of the week with my hexadecimal number entry routine.

    In my opinion, hexadecimal entry is more complex than decimal entry, because the characters 0-9, and A-F are discontinuous in the ASCII table.

    Characters 0-9 need to have 0x30 subtracted, whilst characters A-F need to have 0x37 subtracted. Anything else is not a valid hex digit and can be ignored until a newline character is seen.

    With each incoming character you have to check if it is a legitimate hexadecimal digit, and modify it, either by subtracting 0x30 or 0x37 to get it's true numerical value.

    This test and modify is best done using a short subroutine - at the end of the listing

    Once you have the numerical value allocated to the character the rest of the routine is similar to the decimal entry routine, except that you are multiplying by 16 rather than 10.

    There's a further twist in the tail when you detect the end of the valid digits and have to add in the last digit - modifying it accordingly.

    This first draft allows hexadecimal mumbers up to 0xFFFF to be entered and prints them back in decimal format. 

    You can find the latest listing for the Arduino in my Github Suite-16 repository

    EDIT:  I found some redundant code in the main GETHEX routine and have managed to shorten it from 52 to 34 words.  

    Further optimisation became possible with the test and modify subroutine approach.  

    As the instruction set currently lacks a shift left instruction,  and it's not yet proven that ADD R0, R0 will be implemented in hardware, the routine to multiply the accumulator by 16 is a little cumbersome using eight instructions rather than a possible four.

    EDIT: After proving the ADD R0, R0 instruction and fixing a minor bug the routine is now down to 31 words in length.

    // 0x003C -----------------------------GETHEX----------------------------
            // Accepts a hexadecimal number up to FFFF from terminal input buffer
            // converts it to an integer and puts it into register R0
            // It can then be printed out as a decimal using PRINTNUM - for checking integrity
            // R1 is the pointer in the text buffer - which starts at 0x0200
            // R4 is used as a temporary store for the character in the accumulator R0
            // R5 is used in the "Times 16" routine
            // R7 is used to accumulate the powers of 16 when forming the integer in R0
            0x1100,     // SET R1, 0x0200    text buffer start
            0x1700,     // Don't forget to clear R7 
         // 0x0040--------------------------------------------------------------------------------------  
            0x4100,     // LD AC, @R1  get first character from buffer  :Getchar
            0x3400,     // Store R0 in R4
            0xE100,     // INC R1
            0x4100,     // LD AC, @R1  get next character - and test to see if it's a number or hex digit or space newline etc
            0x0B30,     // Subtract 0x30 Is it bigger than 0x30?
            0x0250,     // BLT 0x50 Quit  No - so must be a space or newline etc
            0x0B17,     // SBI 0x17  is it bigger than 0x47 ascii for "F" ?
            0x0350,     // BGT 0x50 Quit Not a hexadecimal digit
            0x0853,     // CALL 0x0053 Restore, Test and Modify R0
            0xA700,     // Add in the accumulating total from R7 - ready to multiply
            0xA000,     // ADD R0, R0  Double R0  2X
            0xA000,     // ADD R0, R0  Double R0  4X
            0xA000,     // ADD R0, R0  Double R0  8X
            0xA000,     // ADD R0, R0  Double R0  16X
            0x3700,     // Store R0 in R7   R7 is the accumulating total of all the digits multiplied by powers of 16
            0x0040,     // BRA 0x0040       Get the next digit
         // 0x0050-------------------------------------------------------------------------------------- 
            0x0853,     // CALL 0x0053      Restore, Test and modify R0
            0xA700,     // Add the accumulated sum from R7 - integer decimal number is now in R0   
            0x0010,     // BRA 0x0010  Print it in decimal     
         // 0x0053---------------------------------TEST R0 & MODIFY--------------------------------------
            // If R0 = 0-9 subtract 0x30 to form a number 0-9
            // If R0 = A-F subtract 0x37 to form a number 10-15
     0x2400, //...
    Read more »

  • An Assembler for Suite-16

    monsonite10/31/2019 at 13:01 0 comments

    I was delighted to receive a Twitter notification from Frank Eggink, one of this project's followers, with news that he had created a table of Suite-16 instructions so that it can be used by TASM - a table driven assembler popular for small micros about 20 years ago.

    His customised table and a link to the dowload site for TASM32 can be found at his Github repository: Here  

    Many thanks Frank - much appreciated your contribution!

    I have re-jigged the instruction set slightly in the last few days - and the most recent can be found in this simulator file in my Github

    With the changes to the instruction set, I now have no more empty slots, so the NOP at 0x0F00 seems a bit extravagant. 

    With an 8-bit immediate add to the accumulator, the NOP can be created from ADI 0.

    This frees up the 0x0Fxx slot for my proposed (PDP-8 like) OPR instructions including shifts, clears, complements and small constants.

    The main changes are documented in the text header:

    // A simple simulator for Suite-16 processor
    // Add and Subtract Immediate instructions ADI and SBI added at 0x0Axx and 0x0Bxx
    // IN moved to 0x0D00
    // JP@ - Branch to the address held in the accumulator  added at 0x0E00
    /* Suite-16 Instructions
    Register OPS-
         0n        ---       --     Non-Register Ops
         1n        SET       Rn     Constant  (Set)         Rn = @(PC+1)
         2n        LD        Rn     (Load)                  AC = Rn
         3n        ST        Rn     (Store)                 Rn = AC
         4n        LD        @Rn    (Load Indirect)         AC = @Rn
         5n        ST        @Rn    (Store Indirect)        @Rn = AC
         6n        POP       @Rn    Pop  AC                 AC = @Rn  Rn = Rn - 1
         7n        PUSH      @Rn    Push AC                 @Rn = AC  Rn = Rn + 1 
         8n        AND       Rn     (AND)                   AC = AC & Rn 
         9n        OR        Rn     (OR)                    AC = AC | Rn 
         An        ADD       Rn     (Add)                   AC = AC + Rn
         Bn        SUB       Rn     (Sub)                   AC = AC - Rn
         Cn        INV       Rn     (Invert)                Rn = ~Rn
         Dn        DCR       Rn     (Decrement)             Rn = Rn - 1
         En        INR       Rn     (Increment)             Rn = Rn + 1
         Fn        XOR       Rn     (XOR)                   AC = AC ^ Rn
    Non-register OPS-
         00        BRA    Always                        Target = IR7:0
         01        BGT    AC>0                          Target = IR7:0
         02        BLT    AC<0                          Target = IR7:0
         03        BGE    AC>=0                         Target = IR7:0
         04        BLE    AC<=0                         Target = IR7:0 
         05        BNE    AC!=0                         Target = IR7:0
         06        BEQ    AC=0                          Target = IR7:0     
         07        JMP    16-bit                        Target = @(PC+1)
         08        CALL   16-bit                        Target = @(PC+1)
         09        RET    Return
         0A        ADI    Add 8-bit Immediate           Immediate = IR7:0
         0B        SBI    Subtract 8-bit Immediate      Immediate = IR7:0
         0C        OUT                                  putchar(AC)
         0D        IN                                   AC = getchar()
         0E        JP@                                  BRA (R0)
         0F        NOP                                  AC &= AC

  • Immediate Instructions

    monsonite10/29/2019 at 13:14 0 comments

    One of the deficiencies with the Suit-16 instruction set was a lack of an immediate addressing mode, where one of the operands is contained in the next word in memory.

    This specifically was becoming a problem when you wanted to check if the contents of the accumulator lay between two bounds - and branch accordingly.  This type of test is frequently found in ascii to hex or decimal conversion routines and string handling, and after coding a few routines it became obvious that the current situation involving other registers was clumsy and inadequate.

    As a compromise I have added two instructions ADI and SBI which allow an 8-bit value to be coded into the payload area and have it added to or subtracted from the accumulator.

    I have coded these two instructions in the spare 0x0Axx  and 0x0Bxx instruction slots to try them out  and see if they make coding easier and less convoluted. If they are useful they will get added to the final instruction set that will be implemented in hardware.

    Here's an example from the Number entry routine where the input character needs to be tested to find out if it falls between ASCII 0x30 and 0x39 and is therefore a decimal digit.  Registers  R2 and R3 are first preloaded with the constants 0x0A and 0x30 so that they are available for the tests.  These preload instructions will not be needed, saving 4 words of memory,  and the SUB R3 and SUB R2 instructions become  SBI 0x30 and SBI 0x0A respectively.

    Whilst this might seem a trivial change in this example, it will be very useful when testing the input buffer for certain known strings - essential for dealing with high-level languages with keywords.

            0x1300,     // SET R3 0x30   Preload R3 with decimal 48
            0x1200,     // SET R2, 0x0A  Preload R2 with decimal 10
            0x1100,     // SET R1, 0x0200    text buffer start
            0x4100,     // LD AC, @R1  get first character from buffer
            0x3400,     // Store R0 in R4
            0xE100,     // INC R1
            0x4100,     // LD AC, @R1  get next character - and test to see if it is a number
            0xB300,     // Subtract R3 Is it bigger than 0x30?
            0x025A,     // BLT Not a Number
            0xB200,     // Subtract R2  0x0A
            0x035A,     // BGE Not a Number
            0x2400,     // Get original character  R0 back from R4
            0xB300,     // Subtract R3 to form a digit

  • Benchmarking Suite-16

    monsonite10/28/2019 at 13:46 0 comments

    Over the last week I have been running Suite-16 assembly language simulated  in about 60 lines of  C++ code.  I have evolved the simulator over that time, and added some new instructions where it became necessary to use them.

    The simulator has been written using the Arduino IDE - so that anyone with an Arduino compatible board can explore the code and learn how a very simple cpu simulator works.

    Originally I had been simulating the Suite-16 cpu on an MSP430 Launchpad board with FRAM.

    I noticed that despite it being a 16-bit processor, the performance was not so good, so I have swapped over to a Nucleo STM32H743 board which has a 400MHz ARM processor.

    I'm still using the Arduino IDE to develop code - because it has a useful timing function micros() which returns the number of microseconds since the program was started. With this I can get fairly accurate timing information from my simulator.

    I have used one of the spare instruction opcodes to allow the instruction count and the elapsed time to be output to the terminal

    By way of a timing benchmark, I have set up a simple loop that loads R0 with 32767 and repeatedly decrements it until it reaches zero. I then print out instruction count and elapsed number of microseconds.

    Based on the "count down from 32767" loop, my Suite16 simulator is running about 8 million simulated instructions per second.

    That's about 66% of what I'm hoping the TTL cpu to run at.

    Based on the 400MHz clock on the Nucleo board, I can estimate that the simulator in C is taking about 50 ARM instructions to execute a Suite-16 simulated one.

    I tried exactly the same code on the MSP430 which is a nominal 16MHz. Unfortunately the FRAM only works at 8MHz with wait states, so that slows it down considerably to about 75,000 simulated instructions per second.

    So I tried a 16MHz Arduino with an 8-bit AVR ATmega328 and the results were much improved to nearly 139,000 instructions per second.

    The humble AVR is approximately 59 times slower than the ARM, but with a 7uS simulated instruction cycle it is still in the same league as some of the classic minicomputers from the 1960s.

    Update 31-3-2021.

    I am now running the simulator on a 600MHz Teensy 4.0 dev board.

    An empy loop executes at around 50 million iterations per second and an addition, subtraction or logic operation can be performed around 9.2 million times per second.

    I have decided that the ISA of Suite-16 is a very good match for a stack-based bytecode language called STABLE by Sandor Schneider.  

  • Stack Operations

    monsonite10/26/2019 at 12:21 0 comments

    Charles Moore's Forth is based on a 16-bit virtual machine that passes parameters between functions using the Parameter Stack (also known as Data Stack).

    A stack is just a Last In, First Out (LIFO) structure contained in consecutive memory locations. The stack is often placed in the top of memory and grows downwards. So the top of the stack (TOS) is the lowest in memory of all stack items.

    A register or zeropage variable acts as the stack pointer and is pre-decremented when an item is pushed onto the stack, and post-incremented when an item is popped off the stack. The stack pointer always points to the Top of Stack.

    Suite-16 has PUSH and POP operations that may be used with any of the general purpose registers - so multiple independent stacks can be created. The only overhead is the assignment of a register solely as use as a stack pointer to one stack and a suitable section in memory. The stack pointer register should be initialised to it's upper boundary value - for example 0x2000 in RAM.

    The contents of the accumulator R0 are pushed to the memory location addressed by Rn, after Rn has been decremented. Similarly the top member of the stack is popped into the Accumulator and then Rn is incremented.

    The other use of stacks is to hold the return address of subroutines. When a subroutine is called the PC is pushed onto the top of the return stack, and popped back to the PC when a return instruction is executed. With this stacking arrangement it allows for the automatic nesting of subroutines.

    Here's how the PUSH and POP instructions are coded on the simulator:

    case 0x6:   R[0] = M[R[n]]  ; R[n]= R[n]+1   ;  break ; /* POP with post-increment of pointer Rn */
    case 0x7:   R[n]= R[n]-1    ; M[R[n]] = R[0] ;  break ; /* PUSH with pre-decrement of pointer Rn */  

    M[R[n]] is word in RAM pointed to by the stack pointer R[n]    R[0] is our accumulator.

    The other instructions that use a stack are the CALL and RETurn:  Here R[15] is dedicated as the Return Stack Pointer RSP.

    case 0x8:  R[15]= R[15]-1 ; M[R[15]] = PC ; PC = addr ; break ; // CALL (zero page) use R15 as RSP 
    case 0x9:  PC = M[R[15]]  ; R[15]= R[15]+1            ; break ; // RET

     A later log will look at the stack manipulation words such as DUP, DROP, SWAP, OVER that are frequently used in Forth.

  • Hop, Skip and Jump

    monsonite10/24/2019 at 17:29 3 comments

    This week I am working towards getting my pet project, SIMPL working on the Suite-16 simulator. 

    I am making good progress with the main routines that handle decimal number entry and decimal number printing. These have been relatively easy to code, and the codesize compares with the equivalent code written in MSP430 assembly language.

    The next thing I need to code up are the three routines that will provide the mechanics of the interpreter.

    Assembling code by hand is not too difficult, but it helps if you keep a modular approach - and each module has only one entry point and one exit point. It takes more time to plan each module, and then test it - than it does to hand assemble. So for the moment I am not overly concerned that I don't have a full assembler.

    Modular code is the approach taken by Forth. You are encouraged to write short routines that only require a few input variables that are taken off the stack, and in turn will calculate some output result that is placed back on the stack. The stack is the all important communicating pipeline between the functional modules.

    Whilst SIMPL is by no means anything like a full Forth, it does follow closely with some of the techniques used in the interpreter, but the dictionary that is fundamental to Forth is replace with a simple jump table. This makes it possible to have a working SIMPL kernel operating in fewer than 1000 bytes of code.

    The Command Interpreter

    Ward Cunningham who wrote Txtzyme, the precursor to SIMPL, described his interpreter as a switch-case statement contained within a loop.

    I now need to devise an efficient switch statement mechanism for Suite-16, as this is central to the whole functioning of the command interpreter.

    The switch statement is given an input value which it translates by a look-up table mechanism to an output value, and this output value is used as a jump address for the program execution.


    Whilst there are 96 printable ascii codes to be used as commands, we are unlikely to have to use all of them in the jump table. First we can discount the numerical characters as these are handled separately by the number entry routing.  Capital letters are reserved for User Functions or variables, so they will also be handled differently. That just leaves 26 lower case characters and 34 other symbols.  The jump table has already reduced in size from 96 to 60 entries.  It may be possible to reserve 60 words of the zeropage to accommodate the jump table, leaving 196 words for essential code, user variables and structures such as the data stack and return stack.

    The jump mechanism needs some clarification. With Suite-16 we can embed an 8-bit jump address into the lower byte of the instruction. This however is very useful for accessing addresses on the zeropage, and we will need to find an alternative method to access the code words that are more likely to be located outside of page 0.

    The jump table will contain a list of addresses, which are the start addresses for all of the command routines. For example if our accumulator currently holds the ascii character "p" 0x70 and we want to used this to invoke the printnum routine which for example starts at address 0x0100, we need to create a table in memory which at address 0x70 contains the value 0x0100. We can get this address back into the accumulator and then jump to it.

    Trampoline Jumps

    Suite -16 is currently only using an 8-bit jump address which is stored in the payload section of the instruction. If we extend this to a 16-bit jump, the target address will be held in the word following the jump instruction. We can use the accumulator to overwrite this target address, so we can effectively jump to an address that is held in the accumulator. This currently will have to be done in a two stage process, sometimes called a Trampoline Jump.

    Let's assume that the accumulator holds 0x70 the letter p, and we want to jump to address 0x0100 that is held in the lookup table. We...

    Read more »

View all 24 project logs

Enjoy this project?



Marcel van Kervinck wrote 11/10/2019 at 19:06 point

I'm just watching this great presentation on an alternative 16-bit processor for the 6502. AcheronVM. There're some good ideas in there I would certainly play with if I were doing vCPU again.

  Are you sure? yes | no

yeti wrote 10/16/2019 at 04:18 point

Address and data of the same bit sizes ... maybe B isn't far from there. ;-)

  Are you sure? yes | no

Ken Yap wrote 10/15/2019 at 18:41 point

Cool. I remember reading the description of Sweet-16 interpreter in Byte and actually wrote one for the 6800, not the best of host targets. I'll be interested to see how your project goes.

  Are you sure? yes | no

monsonite wrote 10/15/2019 at 20:32 point

Thanks Ken.  There are an infinite number of ways to design a cpu - but I thought Steve Wozniak's Sweet-16 provided a good head-start. I'm going to be using a bitslice design based on a 4-bit slice - because all of the best 16 pin TTL ICs are designed around 4-bits. The slice will contain about 16 to 20 ICs - and 4 slices plus the control circuitry will account for about 100 ICs. The aim is to run it at 12.5MHz, which makes it easier to co-exist with VGA video. I'm not setting out to design another VGA computer, but more of a 16-bit coprocessor that can work alongside Marcel van Kervinck's excellent 8-bit Gigatron TTL computer.  We are all standing on the shoulders of Giants.......

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates