Close
0%
0%

Suite-16

Suite-16 is a 16-bit cpu built entirely from TTL. It is a personal exploration of how hardware and software interact.

Similar projects worth following
Suite-16 is an experimental 16-bit TTL cpu designed to explore cpu architectures and the interactions between hardware, firmware and software needed to make a functioning computer. The name Suite-16 is a word play on Steve Wozniak's 16-bit virtual cpu "Sweet-16" written in 6502 assembly language to augment the Apple II when performing 16-bit operations. It is well documented here: http://www.6502.org/source/interpreters/sweet16.htm#Instruction_Descriptions_

Suite-16

Suite-16 arose from my curiosity regarding how computers work.

I have had a lifetime in hardware engineering but very little real exposure to firmware and software since my Z80 coding days in the 1980s.

The Suite-16 project will encourage me to explore the interactions of hardware, firmware and software - hopefully widening my horizons, learning new skills and achieving the goal of building a working TTL computer system.

Suite-16 refers literally to the suite of deliverables needed to put together a working 16-bit computer system.  

The main deliverable will be a 16-bit computer system designed and built from 7400 series TTL integrated circuits outlined in the next section  below.

From a hardware point of view this will include circuit schematics, pcb layouts, timing diagrams and test-bench programs. It will also likely contain FPGA prototypes with their supporting designs coded in verilog. Part of the project aim is to become familiar with FPGA design and verilog programming - and what better way than to set the goal of a working computer system.

As any computer is the co-ordinated interaction between hardware and software - there will be several key software deliverables needed to complete the design.  These will include the Instruction Set - Architecture or ISA, an assembler and a software simulator of the proposed machine. Later on, when the machine is debugged and can run simple code - there will be applications to be written such as high level languages.

You can find my Github repository here: Suite-16  This is updated regularly to include all the latest Arduino code and sketches.

In summary, Suite-16 is an exploration of the knowledge space bounded by hardware, software and FPGA design.

Main Features

Principle Technology:  5V TTL logic with conventional through-hole DIL packages wherever possible

Data bus:  16-bits

Address bus: 16-bits - with possible extension to 24-bits

Architecture:   Von Neuman  Accumulator-Register

Hardwired Logic - no microcode

Basic 2 stage pipeline allowing most instructions to complete in 1 clock cycle

Registers:   16 general purpose registers R0 to R15.  R0 is the accumulator, most ALU operations are on Accumulator and a Register

Additional Registers:   Program Counter PC, Instruction Register IR

Number of instructions:  31

ALU Instructions: ADD SUB INC DEC COM AND OR XOR

Memory and Register Instructions: SET LD ST LD@ ST@  POP PUSH

Program Flow:  CALL RET JMP BRA  and conditional branching BGT BLT BNE BEQ BGE BLE

Other Instructions:  IN OUT SRA SLA  CLA STC CLC

Addressing Modes:  Immediate #N, Register Direct Rn, Register Indirect @Rn, Indirect Autoincrement @Rn+,  Indexed Mode (Rn+X), Symbolic Mode (PC +X)

Instruction Length:  16-bits

Instruction Format:  8-bit bytecode in IR 15:8 augmented by 8-bit payload in IR 7:0

Hardware:  ALU based on 4-bit bitslice design.  ALU + PC plus register file on one pcb. Memory and mass-storage on 2nd pcb.

Performance:   Aiming for 12.5MHz clock frequency to have some interoperability with VGA video systems.

Target Use:  Co-processor / Accelerator for accompanying TTL colour computer  (Gigatron)

Tools and Resources.

Throughout this project I will need to learn new techniques and use new tools.

Hardware simulation will be done using H. Neeman's "Digital"  simulator - which is an updated version of Logisim  https://github.com/hneemann/Digital

Most of my C code will be run on Arduino compatible platforms - so that it can easily be explored.

For hardware schematics and pcb layout, I generally use EagleCAD 6.x - but I am slowly moving towards open source tools including KiCAD and the online pcb design package from easyEDA https://easyeda.com/

Sometimes we have to resort to FPGAs and a suitable hardware description...

Read more »

  • PRINTHEX

    monsonite7 days ago 0 comments

    PRINTHEX is probably the last of the utility routines that I needed to write in order to get a simple hex loader to run.

    It accepts a 16-bit integer value from the accumulator R0 and prints it out as a 4-digit hexadecimal number. Leading zeros are not suppressed.

    It is based on the decimal number print routine PRINTNUM, but with the added complication that the hex character sequence is not contiguous in the ascii table.

    This is likely to be the last of the hand-assembled routines, because my motivation is from now on  to use the TASM32 assembler - kindly customised for the Suite-16 instruction set by Frank Eggink.

    Having a working hex loader with hex dump and simple monitor commands will be the next goal!

    Here's just the PRINTHEX assembly code - it fits nicely into 48 words of memory:

    // 0x0070 ---------------------------PRINTHEX----------------------------------------- 
            
        //  Prints out contents of R0 as a 4 digit hexadecimal number to the terminal 
        //  Leading zeroes are not suppressed
        //  R1 = Heximation Value
        //  R2 = digit 
        //  R3 = 0x30
        //  R4 = temporary storage for accumulator (Heximated value)
        //  R6 = temporary store for output character
            
            0x1200,     // SET R2, 0x0000
            0x0000,
            0x1300,     // SET R3, 0x0030
            0x0030,
                       
            0x1100,     // R1 = 4096
            0x1000,
            0x088C,     // CALL Heximate
            
            0x1100,     // R1 = 256
            0x0100,
            0x088C,     // CALL Heximate
            
            0x1100,     // R1 = 16
            0x0010,
            0x088C,     // CALL Heximate
    
            0x0A30,     // ADI 0x30 to make a number
            0x3600,     // Store in R6
            0x0B3A,     // SBI  0x3A  - is it bigger than ascii 9
    
         // 0x0080 ---------------------------------------------------------
            
            0x0284,     // BLT 0x84 -  Print decimal digit
            0x0A41,     // ADDI 0x41 - make it a hex digit 
            0x0C00,     // putchar R0
            0x0086,     // BRA CRLF
    
            0x2600,     // LD from R6
            0x0C00,     // putchar R0
            0x1000,     // SET R0, CR
            0x000D,
            
            0x0C00,     // putchar R0, CR
            0x0B03,     // SBI 0x03 Set R0, LF   
            0x0C00,     // putchar R0, LF
            0x0003,     // BRA START     
            
            
         // 0x008C ------------------------Heximate--------------------------------
          
            0xB100,     // SUB R1,     :Heximate 
            0x0290,     // BLT 0x90    
            0xE200,     // INC R2
            0x008C,     // BRA 0x08C
    
         // 0x0090 ---------------------------------------------------------    
    
            0x3400,     // Store R0 in R4   temporary store the remainder
            0x2200,     // MOV R2, R0  get the count from R2
            0x0A30,     // ADI 0x30 to make a number
            0x3600,     // ST R0, R6  - temporary save to R6
            
            0x0B3A,     // SBI  0x3A  - is it bigger than ascii 9
            0x0299,     // BLT 0x99 Print decimal digit
            0x0A41,     // ADI 0x41 - make it a hex digit 
            0x0C00,     // putchar R0
    
            0x009B,     // BRA 0x9B   Restore R0
            0x2600,     // Get R0 back from R6
            0x0C00,     // putchar R0 Print it as a decimal digit
            0x2400,     // Get R0 back from R4
    
            0xA100,     // ADD R1 adds DEC value to restore R0       
            0x1200,     // SET R2,0    Reset R2
            0x0000,
            0x0900,     // RET
                 
           
           

  • Finalising the Instruction Set

    monsonite11/04/2019 at 13:12 1 comment

    Recently, I have been exploring the Suite-16 instruction set, by the practical method of writing assembly language to run on the Suite-16 simulator.

    Starting with a very simple routine to output "Hello World!", I have created routines for decimal and hexadecimal entry, decimal output and a very simple command interpreter.

    In writing these routines, it became clear that there were certain deficiencies in the instruction set and over the last 2 weeks I have slowly added useful instructions to make the cpu more versatile.

    This process is now approaching it's logical conclusion - partly because I have run out of spare instruction slots, and partly because I don't want to create such a complicated instruction set - that I don't stand a chance of implementing it in hardware. 

    Also I feel that after 2 weeks of spare time writing assembly routines it's time to move the project along to its next phase and begin the hardware implementation.

    There are three main areas in which I believe the instruction set can be augmented.  

    The first is making more use of 16-bit immediate operations on the accumulator R0. To the ADI and SBI operations are 8-bit immediate operations where the operand is held in the lower 8 bits of the instruction register.  Extending this to 16-bit will mean that the operand will be held in the next location in memory.  This could be done by making the program counter another general purpose register - and this I believe is how the MSP430 implements immediate operations.

    With this mechanism in place, ADD, SUB, AND, OR and XOR would benefit from having this 16-bit immediate mode.

    My dealings with the decimal and hex routines have also highlighted the need for an efficient left shift on the accumulator.  

    Ideally I can implement as a bare minimum an ADD R0, R0, which will at least allow a doubling of the accumulator without involving any other register.  The "Times 10" and "Times 16" routines used in decimal and hexadecimal entry would benefit from this instruction saving a few instruction cycles.

    Secondly,  I'm proposing that the 8-bit payload in the lower 8 bits of the instruction register can be used as an address to allow the IN and OUT operations to address up to 256 I/O devices.  This is a placeholder for memory mapped I/O peripherals such as UARTs, timers and GPIO which can be added later.

    Thirdly, the final instruction slot 0x0Fxx which is currently used as NOP. I intend to extend this to allow for microinstructions - inspired by the OPR, "OPeRate" instructions used on the PDP-8. 

    Plagiarising the PDP-8

    The OPR instructions allow operations such as clearing and complementing the accumulator, setting and clearing the carry bit and shift and SWAP operations to be implemented.

    The PDP-8 OPR instructions were implemented with the following individual bit-lines that operated directly on the hardware:

    This scheme gives access to 8 individual control lines which could be sequenced to become active in a specific timeslot which allowed quite complex operations to be performed on the accumulator.

    An alternative scheme is possible, where the lower 8-bit payload are fully decoded to allow up to 256 microinstructions.  For maximum flexibility this could be done by using the byte to address a micro instruction ROM such as an additional  AT27C1024.  A 4-bit counter can be used on higher address lines to provide a primitive 16 step microsequencer.  That leaves 4 address lines which could be used as inputs to implement a simple external interrupt system.

    This would be very flexible but requiring more hardware, and probably quite limited by the access time of the AT27C1024 ROM.

    The next plog (project log) will start to look at the hardware architecture and how we might implement a fast microinstruction sequencer using a counter, a diode matrix and some 3-8 line decoders.

  • Hexadecimal Entry

    monsonite11/02/2019 at 20:29 0 comments

    It's been a bit of a slow week, and I must admit that I lost focus in the middle of the week with my hexadecimal number entry routine.

    In my opinion, hexadecimal entry is more complex than decimal entry, because the characters 0-9, and A-F are discontinuous in the ASCII table.

    Characters 0-9 need to have 0x30 subtracted, whilst characters A-F need to have 0x37 subtracted. Anything else is not a valid hex digit and can be ignored until a newline character is seen.

    With each incoming character you have to check if it is a legitimate hexadecimal digit, and modify it, either by subtracting 0x30 or 0x37 to get it's true numerical value.

    This test and modify is best done using a short subroutine - at the end of the listing

    Once you have the numerical value allocated to the character the rest of the routine is similar to the decimal entry routine, except that you are multiplying by 16 rather than 10.

    There's a further twist in the tail when you detect the end of the valid digits and have to add in the last digit - modifying it accordingly.

    This first draft allows hexadecimal mumbers up to 0xFFFF to be entered and prints them back in decimal format. 

    You can find the latest listing for the Arduino in my Github Suite-16 repository

    EDIT:  I found some redundant code in the main GETHEX routine and have managed to shorten it from 52 to 34 words.  

    Further optimisation became possible with the test and modify subroutine approach.  

    As the instruction set currently lacks a shift left instruction,  and it's not yet proven that ADD R0, R0 will be implemented in hardware, the routine to multiply the accumulator by 16 is a little cumbersome using eight instructions rather than a possible four.

    EDIT: After proving the ADD R0, R0 instruction and fixing a minor bug the routine is now down to 31 words in length.

    // 0x003C -----------------------------GETHEX----------------------------
    
            // Accepts a hexadecimal number up to FFFF from terminal input buffer
            // converts it to an integer and puts it into register R0
            // It can then be printed out as a decimal using PRINTNUM - for checking integrity
            // R1 is the pointer in the text buffer - which starts at 0x0200
            // R4 is used as a temporary store for the character in the accumulator R0
            // R5 is used in the "Times 16" routine
            // R7 is used to accumulate the powers of 16 when forming the integer in R0
            
            
            0x1100,     // SET R1, 0x0200    text buffer start
            0x0200,
            0x1700,     // Don't forget to clear R7 
            0x0000, 
    
         // 0x0040--------------------------------------------------------------------------------------  
            
            0x4100,     // LD AC, @R1  get first character from buffer  :Getchar
            0x3400,     // Store R0 in R4
            0xE100,     // INC R1
            0x4100,     // LD AC, @R1  get next character - and test to see if it's a number or hex digit or space newline etc
         
            0x0B30,     // Subtract 0x30 Is it bigger than 0x30?
            0x0250,     // BLT 0x50 Quit  No - so must be a space or newline etc
            0x0B17,     // SBI 0x17  is it bigger than 0x47 ascii for "F" ?
            0x0350,     // BGT 0x50 Quit Not a hexadecimal digit
             
            0x0853,     // CALL 0x0053 Restore, Test and Modify R0
            0xA700,     // Add in the accumulating total from R7 - ready to multiply
            0xA000,     // ADD R0, R0  Double R0  2X
            0xA000,     // ADD R0, R0  Double R0  4X
            
            0xA000,     // ADD R0, R0  Double R0  8X
            0xA000,     // ADD R0, R0  Double R0  16X
            0x3700,     // Store R0 in R7   R7 is the accumulating total of all the digits multiplied by powers of 16
            0x0040,     // BRA 0x0040       Get the next digit
    
         // 0x0050-------------------------------------------------------------------------------------- 
             
            0x0853,     // CALL 0x0053      Restore, Test and modify R0
            0xA700,     // Add the accumulated sum from R7 - integer decimal number is now in R0   
            0x0010,     // BRA 0x0010  Print it in decimal     
           
         // 0x0053---------------------------------TEST R0 & MODIFY--------------------------------------
    
            // If R0 = 0-9 subtract 0x30 to form a number 0-9
            // If R0 = A-F subtract 0x37 to form a number 10-15
    
     0x2400, //...
    Read more »

  • An Assembler for Suite-16

    monsonite10/31/2019 at 13:01 0 comments

    I was delighted to receive a Twitter notification from Frank Eggink, one of this project's followers, with news that he had created a table of Suite-16 instructions so that it can be used by TASM - a table driven assembler popular for small micros about 20 years ago.

    His customised table and a link to the dowload site for TASM32 can be found at his Github repository: Here  

    Many thanks Frank - much appreciated your contribution!

    I have re-jigged the instruction set slightly in the last few days - and the most recent can be found in this simulator file in my Github

    With the changes to the instruction set, I now have no more empty slots, so the NOP at 0x0F00 seems a bit extravagant. 

    With an 8-bit immediate add to the accumulator, the NOP can be created from ADI 0.

    This frees up the 0x0Fxx slot for my proposed (PDP-8 like) OPR instructions including shifts, clears, complements and small constants.

    The main changes are documented in the text header:

    // A simple simulator for Suite-16 processor
    
    // Add and Subtract Immediate instructions ADI and SBI added at 0x0Axx and 0x0Bxx
    // IN moved to 0x0D00
    // JP@ - Branch to the address held in the accumulator  added at 0x0E00
    
    /* Suite-16 Instructions
    
    Register OPS-
         0n        ---       --     Non-Register Ops
         1n        SET       Rn     Constant  (Set)         Rn = @(PC+1)
         2n        LD        Rn     (Load)                  AC = Rn
         3n        ST        Rn     (Store)                 Rn = AC
         4n        LD        @Rn    (Load Indirect)         AC = @Rn
         5n        ST        @Rn    (Store Indirect)        @Rn = AC
         6n        POP       @Rn    Pop  AC                 AC = @Rn  Rn = Rn - 1
         7n        PUSH      @Rn    Push AC                 @Rn = AC  Rn = Rn + 1 
         8n        AND       Rn     (AND)                   AC = AC & Rn 
         9n        OR        Rn     (OR)                    AC = AC | Rn 
         An        ADD       Rn     (Add)                   AC = AC + Rn
         Bn        SUB       Rn     (Sub)                   AC = AC - Rn
         Cn        INV       Rn     (Invert)                Rn = ~Rn
         Dn        DCR       Rn     (Decrement)             Rn = Rn - 1
         En        INR       Rn     (Increment)             Rn = Rn + 1
         Fn        XOR       Rn     (XOR)                   AC = AC ^ Rn
       
      
        
    Non-register OPS-
         00        BRA    Always                        Target = IR7:0
         01        BGT    AC>0                          Target = IR7:0
         02        BLT    AC<0                          Target = IR7:0
         03        BGE    AC>=0                         Target = IR7:0
         04        BLE    AC<=0                         Target = IR7:0 
         05        BNE    AC!=0                         Target = IR7:0
         06        BEQ    AC=0                          Target = IR7:0     
         07        JMP    16-bit                        Target = @(PC+1)
         08        CALL   16-bit                        Target = @(PC+1)
         09        RET    Return
         0A        ADI    Add 8-bit Immediate           Immediate = IR7:0
         0B        SBI    Subtract 8-bit Immediate      Immediate = IR7:0
         0C        OUT                                  putchar(AC)
         0D        IN                                   AC = getchar()
         0E        JP@                                  BRA (R0)
         0F        NOP                                  AC &= AC
       
       */
    
                            
                        

  • Immediate Instructions

    monsonite10/29/2019 at 13:14 0 comments

    One of the deficiencies with the Suit-16 instruction set was a lack of an immediate addressing mode, where one of the operands is contained in the next word in memory.

    This specifically was becoming a problem when you wanted to check if the contents of the accumulator lay between two bounds - and branch accordingly.  This type of test is frequently found in ascii to hex or decimal conversion routines and string handling, and after coding a few routines it became obvious that the current situation involving other registers was clumsy and inadequate.

    As a compromise I have added two instructions ADI and SBI which allow an 8-bit value to be coded into the payload area and have it added to or subtracted from the accumulator.

    I have coded these two instructions in the spare 0x0Axx  and 0x0Bxx instruction slots to try them out  and see if they make coding easier and less convoluted. If they are useful they will get added to the final instruction set that will be implemented in hardware.

    Here's an example from the Number entry routine where the input character needs to be tested to find out if it falls between ASCII 0x30 and 0x39 and is therefore a decimal digit.  Registers  R2 and R3 are first preloaded with the constants 0x0A and 0x30 so that they are available for the tests.  These preload instructions will not be needed, saving 4 words of memory,  and the SUB R3 and SUB R2 instructions become  SBI 0x30 and SBI 0x0A respectively.

    Whilst this might seem a trivial change in this example, it will be very useful when testing the input buffer for certain known strings - essential for dealing with high-level languages with keywords.

            0x1300,     // SET R3 0x30   Preload R3 with decimal 48
            0x0030,
            0x1200,     // SET R2, 0x0A  Preload R2 with decimal 10
            0x000A,
            
            0x1100,     // SET R1, 0x0200    text buffer start
            0x0200,
            0x4100,     // LD AC, @R1  get first character from buffer
            0x3400,     // Store R0 in R4
            
            0xE100,     // INC R1
            0x4100,     // LD AC, @R1  get next character - and test to see if it is a number
            0xB300,     // Subtract R3 Is it bigger than 0x30?
            0x025A,     // BLT Not a Number
            
            0xB200,     // Subtract R2  0x0A
            0x035A,     // BGE Not a Number
            0x2400,     // Get original character  R0 back from R4
            0xB300,     // Subtract R3 to form a digit

  • Benchmarking Suite-16

    monsonite10/28/2019 at 13:46 0 comments

    Over the last week I have been running Suite-16 assembly language simulated  in about 60 lines of  C++ code.  I have evolved the simulator over that time, and added some new instructions where it became necessary to use them.

    The simulator has been written using the Arduino IDE - so that anyone with an Arduino compatible board can explore the code and learn how a very simple cpu simulator works.

    Originally I had been simulating the Suite-16 cpu on an MSP430 Launchpad board with FRAM.

    I noticed that despite it being a 16-bit processor, the performance was not so good, so I have swapped over to a Nucleo STM32H743 board which has a 400MHz ARM processor.

    I'm still using the Arduino IDE to develop code - because it has a useful timing function micros() which returns the number of microseconds since the program was started. With this I can get fairly accurate timing information from my simulator.

    I have used one of the spare instruction opcodes to allow the instruction count and the elapsed time to be output to the terminal

    By way of a timing benchmark, I have set up a simple loop that loads R0 with 32767 and repeatedly decrements it until it reaches zero. I then print out instruction count and elapsed number of microseconds.

    Based on the "count down from 32767" loop, my Suite16 simulator is running about 8 million simulated instructions per second.

    That's about 66% of what I'm hoping the TTL cpu to run at.

    Based on the 400MHz clock on the Nucleo board, I can estimate that the simulator in C is taking about 50 ARM instructions to execute a Suite-16 simulated one.

    I tried exactly the same code on the MSP430 which is a nominal 16MHz. Unfortunately the FRAM only works at 8MHz with wait states, so that slows it down considerably to about 75,000 simulated instructions per second.

    So I tried a 16MHz Arduino with an 8-bit AVR ATmega328 and the results were much improved to nearly 139,000 instructions per second.

    The humble AVR is approximately 59 times slower than the ARM, but with a 7uS simulated instruction cycle it is still in the same league as some of the classic minicomputers from the 1960s.

  • Stack Operations

    monsonite10/26/2019 at 12:21 0 comments

    Charles Moore's Forth is based on a 16-bit virtual machine that passes parameters between functions using the Parameter Stack (also known as Data Stack).

    A stack is just a Last In, First Out (LIFO) structure contained in consecutive memory locations. The stack is often placed in the top of memory and grows downwards. So the top of the stack (TOS) is the lowest in memory of all stack items.

    A register or zeropage variable acts as the stack pointer and is pre-decremented when an item is pushed onto the stack, and post-incremented when an item is popped off the stack. The stack pointer always points to the Top of Stack.

    Suite-16 has PUSH and POP operations that may be used with any of the general purpose registers - so multiple independent stacks can be created. The only overhead is the assignment of a register solely as use as a stack pointer to one stack and a suitable section in memory. The stack pointer register should be initialised to it's upper boundary value - for example 0x2000 in RAM.

    The contents of the accumulator R0 are pushed to the memory location addressed by Rn, after Rn has been decremented. Similarly the top member of the stack is popped into the Accumulator and then Rn is incremented.

    The other use of stacks is to hold the return address of subroutines. When a subroutine is called the PC is pushed onto the top of the return stack, and popped back to the PC when a return instruction is executed. With this stacking arrangement it allows for the automatic nesting of subroutines.

    Here's how the PUSH and POP instructions are coded on the simulator:

    case 0x6:   R[0] = M[R[n]]  ; R[n]= R[n]+1   ;  break ; /* POP with post-increment of pointer Rn */
    case 0x7:   R[n]= R[n]-1    ; M[R[n]] = R[0] ;  break ; /* PUSH with pre-decrement of pointer Rn */  

    M[R[n]] is word in RAM pointed to by the stack pointer R[n]    R[0] is our accumulator.

    The other instructions that use a stack are the CALL and RETurn:  Here R[15] is dedicated as the Return Stack Pointer RSP.

    case 0x8:  R[15]= R[15]-1 ; M[R[15]] = PC ; PC = addr ; break ; // CALL (zero page) use R15 as RSP 
    case 0x9:  PC = M[R[15]]  ; R[15]= R[15]+1            ; break ; // RET

     A later log will look at the stack manipulation words such as DUP, DROP, SWAP, OVER that are frequently used in Forth.

  • Hop, Skip and Jump

    monsonite10/24/2019 at 17:29 3 comments

    This week I am working towards getting my pet project, SIMPL working on the Suite-16 simulator. 

    I am making good progress with the main routines that handle decimal number entry and decimal number printing. These have been relatively easy to code, and the codesize compares with the equivalent code written in MSP430 assembly language.

    The next thing I need to code up are the three routines that will provide the mechanics of the interpreter.

    Assembling code by hand is not too difficult, but it helps if you keep a modular approach - and each module has only one entry point and one exit point. It takes more time to plan each module, and then test it - than it does to hand assemble. So for the moment I am not overly concerned that I don't have a full assembler.

    Modular code is the approach taken by Forth. You are encouraged to write short routines that only require a few input variables that are taken off the stack, and in turn will calculate some output result that is placed back on the stack. The stack is the all important communicating pipeline between the functional modules.

    Whilst SIMPL is by no means anything like a full Forth, it does follow closely with some of the techniques used in the interpreter, but the dictionary that is fundamental to Forth is replace with a simple jump table. This makes it possible to have a working SIMPL kernel operating in fewer than 1000 bytes of code.

    The Command Interpreter

    Ward Cunningham who wrote Txtzyme, the precursor to SIMPL, described his interpreter as a switch-case statement contained within a loop.

    I now need to devise an efficient switch statement mechanism for Suite-16, as this is central to the whole functioning of the command interpreter.

    The switch statement is given an input value which it translates by a look-up table mechanism to an output value, and this output value is used as a jump address for the program execution.

    Commands

    Whilst there are 96 printable ascii codes to be used as commands, we are unlikely to have to use all of them in the jump table. First we can discount the numerical characters as these are handled separately by the number entry routing.  Capital letters are reserved for User Functions or variables, so they will also be handled differently. That just leaves 26 lower case characters and 34 other symbols.  The jump table has already reduced in size from 96 to 60 entries.  It may be possible to reserve 60 words of the zeropage to accommodate the jump table, leaving 196 words for essential code, user variables and structures such as the data stack and return stack.

    The jump mechanism needs some clarification. With Suite-16 we can embed an 8-bit jump address into the lower byte of the instruction. This however is very useful for accessing addresses on the zeropage, and we will need to find an alternative method to access the code words that are more likely to be located outside of page 0.

    The jump table will contain a list of addresses, which are the start addresses for all of the command routines. For example if our accumulator currently holds the ascii character "p" 0x70 and we want to used this to invoke the printnum routine which for example starts at address 0x0100, we need to create a table in memory which at address 0x70 contains the value 0x0100. We can get this address back into the accumulator and then jump to it.

    Trampoline Jumps

    Suite -16 is currently only using an 8-bit jump address which is stored in the payload section of the instruction. If we extend this to a 16-bit jump, the target address will be held in the word following the jump instruction. We can use the accumulator to overwrite this target address, so we can effectively jump to an address that is held in the accumulator. This currently will have to be done in a two stage process, sometimes called a Trampoline Jump.

    Let's assume that the accumulator holds 0x70 the letter p, and we want to jump to address 0x0100 that is held in the lookup table. We...

    Read more »

  • printnum_2

    monsonite10/24/2019 at 15:41 0 comments

    Printnum is the routine that converts a 16-bit binary number to a 5-digit ascii string.

    I covered it a few days ago - but was not happy with the large blocks of code that were repeated 4 times over.

    Now that I have the CALL and RET subroutine mechanism working, I decided to rewrite the routine using one subroutine labelled "Decimate" that is called 4 times, each time with a different decimation factor (10,000, 1000, 100 and 10) stored in R1.  

    The result is much simpler and shorter code - and easier to understand. It is 39 words long, whereas its predecessor was 84 words.

    Here is the new version printnum_2

             // 0x0010 ---------------------------PRINTNUM_2----------------------------------------- 
    
        //  R1 = Decimation Value
        //  R2 = digit 
        //  R3 = 0x30
        //  R4 = temporary storage for accumulator (decimated value)
            
            0x1200,     // SET R2, 0x0000
            0x0000,
            0x1300,     // SET R3, 0x0030
            0x000,
            
            0x1100,     // R1 =  10,000
            0x2710,
            0x082C,     // CALL decimate
           
            0x1100,     // R1 = 1000
            0x03E8,
            0x082C,     // CALL decimate
            
            0x1100,     // R1 = 100
            0x0064,
            0x082C,     // CALL decimate
            
            0x1100,     // R1 = 10
            0x000A,
            0x082C,     // CALL decimate
            
                    
         // 0x0020 ---------------------------------------------------------
                          
            0xA300,     // ADD R0, R3 to make a number
            0x0C00,     // putchar R0
            0xB300,     // SUB R3 to restore accumulator
            0x1000,     // SET R0, CR
            
            0x000D,
            0x0C00,     // putchar R0 CR
            0x1000,     // Set R0, LF
            0x000A,
            
            0x0C00,     // putchar R0 LF
            0x0000,     // BRA START     
            0x0F00,     // NOP
            0x0F00,     // NOP
            
         // 0x002C ------------------------Decimate--------------------------------
          
            0xB100,     // SUB R1,     :Decimate 
            0x0230,     // BLT 0x30    
            0xE200,     // INC R2
            0x002C,     // BRA 0x02C
    
         // 0x0030 ---------------------------------------------------------    
    
            0x3400,     // Store R0 in R4   
            0x2200,     // MOV R2, R0,  
            0xA300,     // ADD R0, R3 to make a number
            0x0C00,     // putchar R0
             
            0x2400,     // Get R0 back from R4                 
            0xA100,     // ADD R1 adds DEC value to restore R0       
            0x1200,     // SET R2,0    Reset R2
            0x0000,
    
            0x0900,     // RET

  • Number

    monsonite10/24/2019 at 11:58 0 comments

    Here's the second major routine needed to make working with decimal numerical input possible.

    Get_Number parses through the text input buffer looking for numerical ascii codes that fall in the range 0x30 to 0x39.  

    These correspond to the digits 0 to 9, so the first thing you must do is subtract 0x30 (decimal 48) from the incoming character to turn it into a decimal digit. It is convenient to preload one of the working registers with 0x30 (and keep it there) and in this application I choose R3 to contain 0x30.

    Once you have your first decimal digit in the accumulator, you need to multiply it by 10. Then you take your second digit, add it to the sum in the accumulator and multiply by 10 again. You continue this process until you have added the last digit to the total, and the accumulator will hold the decimal equivalent of the numerical string in the text buffer.

    However, you have to know where to stop multiplying by 10, so that the last digit in the string is just added . To do this you have to continuously look ahead at the next character in the input buffer and determine if it is a numerical character or not.  If it's not a number, you skip the times 10 multiplication and jump to a final addition.

    Here is the resulting code - hand assembled for Suite -16.

         // 0x0090 --------------------------------NUMBER------------------------------------- 
    
            0x1300,     // SET R3 0x30   Preload R3 with decimal 48
            0x0030,
            0x1200,     // SET R2, 0x0A  Preload R2 with decimal 10
            0x000A,
            
            0x1100,     // SET R1, 0x0200    text buffer start
            0x0200,
            0x4100,     // LD AC, @R1  get first character from buffer
            0x3400,     // Store R0 in R4
            
            0xE100,     // INC R1
            0x4100,     // LD AC, @R1  get next character - and test to see if it is a number
            0xB300,     // Subtract R3  Is it bigger than 0x30?
            0x02AA,     // BLT Not a Number
            
            0xB200,     // Subtract R2  0x0A
            0x03AA,     // BGE Not a Number
            0x2400,     // Get original character  R0 back from R4
            0xB300,     // Subtract R3  to form a digit
    
    
         // 0x00A0 --------------------------------------------------------------------- 
                  
            0xA700,     // get the accumulating total  from R7 - ready to multiply
            0x3500,     // Store R0 in R5  Digit is now in R5
            0xA500,     // ADD R5  Accumulator R0 = (2 * digit)  
            0x3600,     // Store R0 in R6  - this is the "Times 10" routine
           
            0xA600,     // ADD R6   4 X
            0xA600,     // ADD R6   6 X        
            0xA600,     // ADD R6   8 X 
            0xA600,     // ADD R6   10 X 
            
            0x3700,     // Store R0 in R7   R7 is accumulation of all the digits multiplied by powers of 10        
            0x0096,     // BRA 0x0096 Get the next digit
            0x0F00,     // Not a number  address = 0xAA
            0x2400,     // Get last digit  R0 back from R4
                   
            0xB300,     // Subtract R3
            0xA700,     // Add the accumulated sum from R7 - decimal number is now it R0
            0x1700,     // Don't forget to clear R7
            0x0000,
    
         // 0x00B0 --------------------------------------------------------------------- 
           
            0x0802,     // CALL 0x0002  PRINTNUM
            0x0000,      // BRA START
    

    Incidentally, a functionally similar routine coded in MSP430 assembly language took 25 16-bit words (although in MSP430 ASM it's only 16 instructions). I'm quite happy with this implementation by comparison - as I use 8 words to initialise registers.

    One thing that would be useful in the instruction set - would be immediate operations on the accumulator with the short-constant held in the Payload-byte.

    There are several occasions when handling ascii characters, that you want to test against some value and modify the contents of the accumulator.  Being able to add or subtract an 8-bit literal from the accumulator without having to use another register would be a big advantage.

    Whilst some might think that hand-assembly must be some form of mental masochism, it's actually not that difficult.

    Actually writing the code takes surprisingly little time.  What does take the time is thinking how to write the algorithm efficiently using the instructions and registers available, and the time spent recompiling and testing the code.

    You will see that I am listing my code in little blocks of 4 instructions, with a commented...

    Read more »

View all 22 project logs

Enjoy this project?

Share

Discussions

Marcel van Kervinck wrote 2 days ago point

I'm just watching this great presentation on an alternative 16-bit processor for the 6502. https://www.youtube.com/watch?v=zdJnz6-d060 AcheronVM. There're some good ideas in there I would certainly play with if I were doing vCPU again.

  Are you sure? yes | no

yeti wrote 10/16/2019 at 04:18 point

Address and data of the same bit sizes ... maybe B isn't far from there. ;-)

  Are you sure? yes | no

Ken Yap wrote 10/15/2019 at 18:41 point

Cool. I remember reading the description of Sweet-16 interpreter in Byte and actually wrote one for the 6800, not the best of host targets. I'll be interested to see how your project goes.

  Are you sure? yes | no

monsonite wrote 10/15/2019 at 20:32 point

Thanks Ken.  There are an infinite number of ways to design a cpu - but I thought Steve Wozniak's Sweet-16 provided a good head-start. I'm going to be using a bitslice design based on a 4-bit slice - because all of the best 16 pin TTL ICs are designed around 4-bits. The slice will contain about 16 to 20 ICs - and 4 slices plus the control circuitry will account for about 100 ICs. The aim is to run it at 12.5MHz, which makes it easier to co-exist with VGA video. I'm not setting out to design another VGA computer, but more of a 16-bit coprocessor that can work alongside Marcel van Kervinck's excellent 8-bit Gigatron TTL computer.  We are all standing on the shoulders of Giants.......

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates