01/21/2020 at 20:20 •
Back in late October, just as I was starting to hand-code some assembly routines for Suite-16, I considered porting my bytecode interpreter SIMPL across from MSP430 assembly language to that of the Suite-16.
Unfortunately at that time, the instruction set was very much in a state of flux and still evolving, and hand assembly was somewhat time-consuming. Revisiting this task now that we have an assembler and a hexloader in our toolchain armoury makes the job so much easier, and I have transcribed the SIMPL framework in a long afternoon sprint of about 6 hours coding. As well as the code, it is highly commented, thus documenting the workings of the SIMPL interpreter as I went along.
What is SIMPL? At it's most basic level it is an interpreter consisting of a switch statement contained in a loop. After all - that is how most simple processors and virtual machines are simulated. To make the job easier in assembly language - the switch statement is replaced with a jump table, with one 16-bit entry for each of the 96 printable ascii characters. In essence, when a command character is read from the input buffer, it is used to index into the jump table and pick up a 16-bit address for the code-block associated with that command.
SIMPL is stack based, so numbers are put onto the stack and operated on there. It is a tiny-Forth-like utility language without the complexity of the dictionary and text string matching that is needed in a full-blown Forth.
So the commands include familiar mathematical and logical symbols such as + - * and /, which obviously relate to arithmetical operations.
Then there are the stack operations - familiar to those using Forth, such as DUP, DROP, SWAP and OVER. In addition there are commands associated with decimal and hexadecimal number entry and also output routines to a serial terminal for decimal, hexadecimal and hex-dump formats.
The user has 26 commands available - that can be user defined and customised. These are represented by the uppercase characters A to Z. The language is extensible just by assigning a user command to a snippet of instructions.
SIMPL is very compact. The minimal interpreter framework with all the input and output utility routines is under 256 program words. The Jump table is a further 96 program words. To this you must add the action routines associated with all of the potential 96 commands - but these are often very small - each only a few instructions long.
SIMPL is easy to tailor to your own application, as rudimentary or complex as you wish. If you want commands for floating point math routines then these can be added. Generally I use it for exercising new hardware - or in this case thrashing out any discrepencies in the Suite-16 instruction set.
I have placed the SIMPL framework in my github repository here: - it is as yet untried and probably buggy - but gives an idea what can be done with a couple of hundred words of assembly language.
01/13/2020 at 16:20 •
In mid-November, I traveled out to Northern California, to attend Forth Day at Stanford University - and to meet one of my Computing Heroes, Charles H. Moore - the inventor of Forth. Returning in late November, I got stuck in a bit of a rut, plus a lot of other things were putting pressure on my free time - which meant that the Suite-16 project was put on the back-burner with no further progress. My project-colleague, Frank, is currently touring Australia and Tasmania for the duration of January, and so that gives me a two-week window of opportunity to take stock of the project so far and plan out the goals for 2020.
It was always my intention that Suite-16 would exist as a C-simulator, a computer implemented in standard TTL and as a verilog implementation to run on an FPGA as a soft-cpu. In discussions with Frank, we have decided that now we have a working simulator, the next step is to convert this into a soft-cpu developed and running on opensource FPGA hardware. When this task is done, I will have had a lot more experience writing in verilog, and Frank and I, will have stable target hardware platforms with which we can do real system development. The TTL processor will follow on later, as it will be a lot of hard work, but at least the architecture will have been thoroughly explored and documented by that time.
The FPGA family I have chosen to use is the Lattice ICE 40 series. I first encountered these in early 2015 when Clifford Wolf announced his "Project IceStorm" reverse engineered, opensource FPGA toolchain - allowing the Lattice parts to be programmed using a manageable, open source toolchain. One of the first soft-cpus to benefit from this announcement was James Bowman's J1 Forth CPU. Video here: James had previously used Xilinx parts for earlier implementations of his J1, but by May 2015 had proven that it would easily fit onto a ICE 40 HX1K - which is a 1K lUT FPGA used on the $25 Lattice IceStick development board. James and I have collaborated on a couple of projects, having first met at the Open Hardware Summit in New York in September 2011.
In May 2016 a friend, Alan Wood, and I decided to develop an opensource FPGA dev-board based on the Lattice ICE40 HX4K. The result was the myStorm BlackIce, which is now into it's fifth generation. Hardware development took about 14 weeks from idea to the first 250 boards arriving from the manufacturers in Shenzhen, China. We debuted at the 2016 OSHCamp (Open Source Hardware Camp) held annually in Hebden Bridge, West Yorkshire, UK.
In November 2016, I met up with James again at Forth Day in Palo Alto, where he had implemented his J1 cpu on one of our BlackIce boards, and was serving his presentation as a series of jpgs from it. The BlackIce hardware had been fully exercised and proven robust and reliable - and makes an ideal platform for soft cpus or SoCs.
The 2nd generation BlackIce II board has a Lattice ICE 40 HX4K FPGA and 256K words of 16-bit, 10nS SRAM. Programming (configuration) is done over a USB connection, using an STM32L433 ARM Cortex M4 to act as the programming interface. We could have just used a FTDI FT2232H as a USB to SPI converter (like everyone else does) - but frankly at $3.50 the FTDI device is too expensive and the STM32 offers so much more flexibility for less money. Once the FPGA is programmed, the STM32 can be used to provide slave peripherals for the FPGA (such as ADC, DAC, UART, I2C, SPI etc). Later generation of myStorm boards have all adopted this ARM/FPGA symbiosis.
Implementing a soft-cpu in verilog
verilog, like vhdl is a popular hardware description language or HDL. verilog has its roots in C, whilst vhdl arose out of the US department of defense and is structured like Ada. Both are equally used, but verilog has my vote, because I know nothing about vhdl syntax.
When James Bowman transcribed his J1 cpu to verilog, he implemented it as a very neat switch-case (casez) statement in fewer than 110 lines of verilog in his J1 Github repository.
Using James's code as a useful tutorial example, it's fairly easy to see how an instruction set implemented as a switch-case statement in C, translates cleanly to verilog. His code is well commented and easily readable, nobody likes obfuscated code (apart from obfuscated code masochists).
This code is just the cpu, and the interface to memory implemented in internal BRAM. In a practical working system we will also require a module of code to define a serial communications UART, an external SRAM interface and if we are feeling adventurous, a VGA colour graphics interface and external frame buffer. For a complete system a PS/2 keyboard interface would also be useful - and this can be added as a PMOD - external hardware module.
11/05/2019 at 22:50 •
PRINTHEX is probably the last of the utility routines that I needed to write in order to get a simple hex loader to run.
It accepts a 16-bit integer value from the accumulator R0 and prints it out as a 4-digit hexadecimal number. Leading zeros are not suppressed.
It is based on the decimal number print routine PRINTNUM, but with the added complication that the hex character sequence is not contiguous in the ascii table.
This is likely to be the last of the hand-assembled routines, because my motivation is from now on to use the TASM32 assembler - kindly customised for the Suite-16 instruction set by Frank Eggink.
Having a working hex loader with hex dump and simple monitor commands will be the next goal!
Here's just the PRINTHEX assembly code - it fits nicely into 48 words of memory:
// 0x0070 ---------------------------PRINTHEX----------------------------------------- // Prints out contents of R0 as a 4 digit hexadecimal number to the terminal // Leading zeroes are not suppressed // R1 = Heximation Value // R2 = digit // R3 = 0x30 // R4 = temporary storage for accumulator (Heximated value) // R6 = temporary store for output character 0x1200, // SET R2, 0x0000 0x0000, 0x1300, // SET R3, 0x0030 0x0030, 0x1100, // R1 = 4096 0x1000, 0x088C, // CALL Heximate 0x1100, // R1 = 256 0x0100, 0x088C, // CALL Heximate 0x1100, // R1 = 16 0x0010, 0x088C, // CALL Heximate 0x0A30, // ADI 0x30 to make a number 0x3600, // Store in R6 0x0B3A, // SBI 0x3A - is it bigger than ascii 9 // 0x0080 --------------------------------------------------------- 0x0284, // BLT 0x84 - Print decimal digit 0x0A41, // ADDI 0x41 - make it a hex digit 0x0C00, // putchar R0 0x0086, // BRA CRLF 0x2600, // LD from R6 0x0C00, // putchar R0 0x1000, // SET R0, CR 0x000D, 0x0C00, // putchar R0, CR 0x0B03, // SBI 0x03 Set R0, LF 0x0C00, // putchar R0, LF 0x0003, // BRA START // 0x008C ------------------------Heximate-------------------------------- 0xB100, // SUB R1, :Heximate 0x0290, // BLT 0x90 0xE200, // INC R2 0x008C, // BRA 0x08C // 0x0090 --------------------------------------------------------- 0x3400, // Store R0 in R4 temporary store the remainder 0x2200, // MOV R2, R0 get the count from R2 0x0A30, // ADI 0x30 to make a number 0x3600, // ST R0, R6 - temporary save to R6 0x0B3A, // SBI 0x3A - is it bigger than ascii 9 0x0299, // BLT 0x99 Print decimal digit 0x0A41, // ADI 0x41 - make it a hex digit 0x0C00, // putchar R0 0x009B, // BRA 0x9B Restore R0 0x2600, // Get R0 back from R6 0x0C00, // putchar R0 Print it as a decimal digit 0x2400, // Get R0 back from R4 0xA100, // ADD R1 adds DEC value to restore R0 0x1200, // SET R2,0 Reset R2 0x0000, 0x0900, // RET
11/04/2019 at 13:12 •
Recently, I have been exploring the Suite-16 instruction set, by the practical method of writing assembly language to run on the Suite-16 simulator.
Starting with a very simple routine to output "Hello World!", I have created routines for decimal and hexadecimal entry, decimal output and a very simple command interpreter.
In writing these routines, it became clear that there were certain deficiencies in the instruction set and over the last 2 weeks I have slowly added useful instructions to make the cpu more versatile.
This process is now approaching it's logical conclusion - partly because I have run out of spare instruction slots, and partly because I don't want to create such a complicated instruction set - that I don't stand a chance of implementing it in hardware.
Also I feel that after 2 weeks of spare time writing assembly routines it's time to move the project along to its next phase and begin the hardware implementation.
There are three main areas in which I believe the instruction set can be augmented.
The first is making more use of 16-bit immediate operations on the accumulator R0. To the ADI and SBI operations are 8-bit immediate operations where the operand is held in the lower 8 bits of the instruction register. Extending this to 16-bit will mean that the operand will be held in the next location in memory. This could be done by making the program counter another general purpose register - and this I believe is how the MSP430 implements immediate operations.
With this mechanism in place, ADD, SUB, AND, OR and XOR would benefit from having this 16-bit immediate mode.
My dealings with the decimal and hex routines have also highlighted the need for an efficient left shift on the accumulator.
Ideally I can implement as a bare minimum an ADD R0, R0, which will at least allow a doubling of the accumulator without involving any other register. The "Times 10" and "Times 16" routines used in decimal and hexadecimal entry would benefit from this instruction saving a few instruction cycles.
Secondly, I'm proposing that the 8-bit payload in the lower 8 bits of the instruction register can be used as an address to allow the IN and OUT operations to address up to 256 I/O devices. This is a placeholder for memory mapped I/O peripherals such as UARTs, timers and GPIO which can be added later.
Thirdly, the final instruction slot 0x0Fxx which is currently used as NOP. I intend to extend this to allow for microinstructions - inspired by the OPR, "OPeRate" instructions used on the PDP-8.
Plagiarising the PDP-8
The OPR instructions allow operations such as clearing and complementing the accumulator, setting and clearing the carry bit and shift and SWAP operations to be implemented.
The PDP-8 OPR instructions were implemented with the following individual bit-lines that operated directly on the hardware:
This scheme gives access to 8 individual control lines which could be sequenced to become active in a specific timeslot which allowed quite complex operations to be performed on the accumulator.
An alternative scheme is possible, where the lower 8-bit payload are fully decoded to allow up to 256 microinstructions. For maximum flexibility this could be done by using the byte to address a micro instruction ROM such as an additional AT27C1024. A 4-bit counter can be used on higher address lines to provide a primitive 16 step microsequencer. That leaves 4 address lines which could be used as inputs to implement a simple external interrupt system.
This would be very flexible but requiring more hardware, and probably quite limited by the access time of the AT27C1024 ROM.
The next plog (project log) will start to look at the hardware architecture and how we might implement a fast microinstruction sequencer using a counter, a diode matrix and some 3-8 line decoders.
11/02/2019 at 20:29 •
It's been a bit of a slow week, and I must admit that I lost focus in the middle of the week with my hexadecimal number entry routine.
In my opinion, hexadecimal entry is more complex than decimal entry, because the characters 0-9, and A-F are discontinuous in the ASCII table.
Characters 0-9 need to have 0x30 subtracted, whilst characters A-F need to have 0x37 subtracted. Anything else is not a valid hex digit and can be ignored until a newline character is seen.
With each incoming character you have to check if it is a legitimate hexadecimal digit, and modify it, either by subtracting 0x30 or 0x37 to get it's true numerical value.
This test and modify is best done using a short subroutine - at the end of the listing
Once you have the numerical value allocated to the character the rest of the routine is similar to the decimal entry routine, except that you are multiplying by 16 rather than 10.
There's a further twist in the tail when you detect the end of the valid digits and have to add in the last digit - modifying it accordingly.
This first draft allows hexadecimal mumbers up to 0xFFFF to be entered and prints them back in decimal format.
You can find the latest listing for the Arduino in my Github Suite-16 repository
EDIT: I found some redundant code in the main GETHEX routine and have managed to shorten it from 52 to 34 words.
Further optimisation became possible with the test and modify subroutine approach.
As the instruction set currently lacks a shift left instruction, and it's not yet proven that ADD R0, R0 will be implemented in hardware, the routine to multiply the accumulator by 16 is a little cumbersome using eight instructions rather than a possible four.
EDIT: After proving the ADD R0, R0 instruction and fixing a minor bug the routine is now down to 31 words in length.
// 0x003C -----------------------------GETHEX---------------------------- // Accepts a hexadecimal number up to FFFF from terminal input buffer // converts it to an integer and puts it into register R0 // It can then be printed out as a decimal using PRINTNUM - for checking integrity // R1 is the pointer in the text buffer - which starts at 0x0200 // R4 is used as a temporary store for the character in the accumulator R0 // R5 is used in the "Times 16" routine // R7 is used to accumulate the powers of 16 when forming the integer in R0 0x1100, // SET R1, 0x0200 text buffer start 0x0200, 0x1700, // Don't forget to clear R7 0x0000, // 0x0040-------------------------------------------------------------------------------------- 0x4100, // LD AC, @R1 get first character from buffer :Getchar 0x3400, // Store R0 in R4 0xE100, // INC R1 0x4100, // LD AC, @R1 get next character - and test to see if it's a number or hex digit or space newline etc 0x0B30, // Subtract 0x30 Is it bigger than 0x30? 0x0250, // BLT 0x50 Quit No - so must be a space or newline etc 0x0B17, // SBI 0x17 is it bigger than 0x47 ascii for "F" ? 0x0350, // BGT 0x50 Quit Not a hexadecimal digit 0x0853, // CALL 0x0053 Restore, Test and Modify R0 0xA700, // Add in the accumulating total from R7 - ready to multiply 0xA000, // ADD R0, R0 Double R0 2X 0xA000, // ADD R0, R0 Double R0 4X 0xA000, // ADD R0, R0 Double R0 8X 0xA000, // ADD R0, R0 Double R0 16X 0x3700, // Store R0 in R7 R7 is the accumulating total of all the digits multiplied by powers of 16 0x0040, // BRA 0x0040 Get the next digit // 0x0050-------------------------------------------------------------------------------------- 0x0853, // CALL 0x0053 Restore, Test and modify R0 0xA700, // Add the accumulated sum from R7 - integer decimal number is now in R0 0x0010, // BRA 0x0010 Print it in decimal // 0x0053---------------------------------TEST R0 & MODIFY-------------------------------------- // If R0 = 0-9 subtract 0x30 to form a number 0-9 // If R0 = A-F subtract 0x37 to form a number 10-15 0x2400, // Get R0 back from R4 - we now have a hex character in the range 0-F and need to convert it to a value 0-15 0x0B40, // Subtract 0x40 Is it bigger than 0x40? Then subtract 0x37 else subtract 0x30 0x0258, // BLT Not A-F so subtract 30 and return 0x0A09, // ADI 0x09 (restores and corrects R0 to correct numerical value) 0x005A, // BRA Return 0x2400, // LD R0, R4 Get the character back in R0 - we know it's 0-9 0x0B30, // Subtract 0x30 0x0900, // RET 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP // 0x0060 -------------------------------------------------------------------------------------
10/31/2019 at 13:01 •
I was delighted to receive a Twitter notification from Frank Eggink, one of this project's followers, with news that he had created a table of Suite-16 instructions so that it can be used by TASM - a table driven assembler popular for small micros about 20 years ago.
His customised table and a link to the dowload site for TASM32 can be found at his Github repository: Here
Many thanks Frank - much appreciated your contribution!
I have re-jigged the instruction set slightly in the last few days - and the most recent can be found in this simulator file in my Github
With the changes to the instruction set, I now have no more empty slots, so the NOP at 0x0F00 seems a bit extravagant.
With an 8-bit immediate add to the accumulator, the NOP can be created from ADI 0.
This frees up the 0x0Fxx slot for my proposed (PDP-8 like) OPR instructions including shifts, clears, complements and small constants.
The main changes are documented in the text header:
// A simple simulator for Suite-16 processor // Add and Subtract Immediate instructions ADI and SBI added at 0x0Axx and 0x0Bxx // IN moved to 0x0D00 // JP@ - Branch to the address held in the accumulator added at 0x0E00 /* Suite-16 Instructions Register OPS- 0n --- -- Non-Register Ops 1n SET Rn Constant (Set) Rn = @(PC+1) 2n LD Rn (Load) AC = Rn 3n ST Rn (Store) Rn = AC 4n LD @Rn (Load Indirect) AC = @Rn 5n ST @Rn (Store Indirect) @Rn = AC 6n POP @Rn Pop AC AC = @Rn Rn = Rn - 1 7n PUSH @Rn Push AC @Rn = AC Rn = Rn + 1 8n AND Rn (AND) AC = AC & Rn 9n OR Rn (OR) AC = AC | Rn An ADD Rn (Add) AC = AC + Rn Bn SUB Rn (Sub) AC = AC - Rn Cn INV Rn (Invert) Rn = ~Rn Dn DCR Rn (Decrement) Rn = Rn - 1 En INR Rn (Increment) Rn = Rn + 1 Fn XOR Rn (XOR) AC = AC ^ Rn Non-register OPS- 00 BRA Always Target = IR7:0 01 BGT AC>0 Target = IR7:0 02 BLT AC<0 Target = IR7:0 03 BGE AC>=0 Target = IR7:0 04 BLE AC<=0 Target = IR7:0 05 BNE AC!=0 Target = IR7:0 06 BEQ AC=0 Target = IR7:0 07 JMP 16-bit Target = @(PC+1) 08 CALL 16-bit Target = @(PC+1) 09 RET Return 0A ADI Add 8-bit Immediate Immediate = IR7:0 0B SBI Subtract 8-bit Immediate Immediate = IR7:0 0C OUT putchar(AC) 0D IN AC = getchar() 0E JP@ BRA (R0) 0F NOP AC &= AC */
10/29/2019 at 13:14 •
One of the deficiencies with the Suit-16 instruction set was a lack of an immediate addressing mode, where one of the operands is contained in the next word in memory.
This specifically was becoming a problem when you wanted to check if the contents of the accumulator lay between two bounds - and branch accordingly. This type of test is frequently found in ascii to hex or decimal conversion routines and string handling, and after coding a few routines it became obvious that the current situation involving other registers was clumsy and inadequate.
As a compromise I have added two instructions ADI and SBI which allow an 8-bit value to be coded into the payload area and have it added to or subtracted from the accumulator.
I have coded these two instructions in the spare 0x0Axx and 0x0Bxx instruction slots to try them out and see if they make coding easier and less convoluted. If they are useful they will get added to the final instruction set that will be implemented in hardware.
Here's an example from the Number entry routine where the input character needs to be tested to find out if it falls between ASCII 0x30 and 0x39 and is therefore a decimal digit. Registers R2 and R3 are first preloaded with the constants 0x0A and 0x30 so that they are available for the tests. These preload instructions will not be needed, saving 4 words of memory, and the SUB R3 and SUB R2 instructions become SBI 0x30 and SBI 0x0A respectively.
Whilst this might seem a trivial change in this example, it will be very useful when testing the input buffer for certain known strings - essential for dealing with high-level languages with keywords.
0x1300, // SET R3 0x30 Preload R3 with decimal 48 0x0030, 0x1200, // SET R2, 0x0A Preload R2 with decimal 10 0x000A, 0x1100, // SET R1, 0x0200 text buffer start 0x0200, 0x4100, // LD AC, @R1 get first character from buffer 0x3400, // Store R0 in R4 0xE100, // INC R1 0x4100, // LD AC, @R1 get next character - and test to see if it is a number 0xB300, // Subtract R3 Is it bigger than 0x30? 0x025A, // BLT Not a Number 0xB200, // Subtract R2 0x0A 0x035A, // BGE Not a Number 0x2400, // Get original character R0 back from R4 0xB300, // Subtract R3 to form a digit
10/28/2019 at 13:46 •
Over the last week I have been running Suite-16 assembly language simulated in about 60 lines of C++ code. I have evolved the simulator over that time, and added some new instructions where it became necessary to use them.
The simulator has been written using the Arduino IDE - so that anyone with an Arduino compatible board can explore the code and learn how a very simple cpu simulator works.
Originally I had been simulating the Suite-16 cpu on an MSP430 Launchpad board with FRAM.
I noticed that despite it being a 16-bit processor, the performance was not so good, so I have swapped over to a Nucleo STM32H743 board which has a 400MHz ARM processor.
I'm still using the Arduino IDE to develop code - because it has a useful timing function micros() which returns the number of microseconds since the program was started. With this I can get fairly accurate timing information from my simulator.
I have used one of the spare instruction opcodes to allow the instruction count and the elapsed time to be output to the terminal
By way of a timing benchmark, I have set up a simple loop that loads R0 with 32767 and repeatedly decrements it until it reaches zero. I then print out instruction count and elapsed number of microseconds.
Based on the "count down from 32767" loop, my Suite16 simulator is running about 8 million simulated instructions per second.
That's about 66% of what I'm hoping the TTL cpu to run at.
Based on the 400MHz clock on the Nucleo board, I can estimate that the simulator in C is taking about 50 ARM instructions to execute a Suite-16 simulated one.
I tried exactly the same code on the MSP430 which is a nominal 16MHz. Unfortunately the FRAM only works at 8MHz with wait states, so that slows it down considerably to about 75,000 simulated instructions per second.
So I tried a 16MHz Arduino with an 8-bit AVR ATmega328 and the results were much improved to nearly 139,000 instructions per second.
The humble AVR is approximately 59 times slower than the ARM, but with a 7uS simulated instruction cycle it is still in the same league as some of the classic minicomputers from the 1960s.
10/26/2019 at 12:21 •
Charles Moore's Forth is based on a 16-bit virtual machine that passes parameters between functions using the Parameter Stack (also known as Data Stack).
A stack is just a Last In, First Out (LIFO) structure contained in consecutive memory locations. The stack is often placed in the top of memory and grows downwards. So the top of the stack (TOS) is the lowest in memory of all stack items.
A register or zeropage variable acts as the stack pointer and is pre-decremented when an item is pushed onto the stack, and post-incremented when an item is popped off the stack. The stack pointer always points to the Top of Stack.
Suite-16 has PUSH and POP operations that may be used with any of the general purpose registers - so multiple independent stacks can be created. The only overhead is the assignment of a register solely as use as a stack pointer to one stack and a suitable section in memory. The stack pointer register should be initialised to it's upper boundary value - for example 0x2000 in RAM.
The contents of the accumulator R0 are pushed to the memory location addressed by Rn, after Rn has been decremented. Similarly the top member of the stack is popped into the Accumulator and then Rn is incremented.
The other use of stacks is to hold the return address of subroutines. When a subroutine is called the PC is pushed onto the top of the return stack, and popped back to the PC when a return instruction is executed. With this stacking arrangement it allows for the automatic nesting of subroutines.
Here's how the PUSH and POP instructions are coded on the simulator:
case 0x6: R = M[R[n]] ; R[n]= R[n]+1 ; break ; /* POP with post-increment of pointer Rn */ case 0x7: R[n]= R[n]-1 ; M[R[n]] = R ; break ; /* PUSH with pre-decrement of pointer Rn */
M[R[n]] is word in RAM pointed to by the stack pointer R[n] R is our accumulator.
The other instructions that use a stack are the CALL and RETurn: Here R is dedicated as the Return Stack Pointer RSP.
case 0x8: R= R-1 ; M[R] = PC ; PC = addr ; break ; // CALL (zero page) use R15 as RSP case 0x9: PC = M[R] ; R= R+1 ; break ; // RET
A later log will look at the stack manipulation words such as DUP, DROP, SWAP, OVER that are frequently used in Forth.
10/24/2019 at 17:29 •
This week I am working towards getting my pet project, SIMPL working on the Suite-16 simulator.
I am making good progress with the main routines that handle decimal number entry and decimal number printing. These have been relatively easy to code, and the codesize compares with the equivalent code written in MSP430 assembly language.
The next thing I need to code up are the three routines that will provide the mechanics of the interpreter.
Assembling code by hand is not too difficult, but it helps if you keep a modular approach - and each module has only one entry point and one exit point. It takes more time to plan each module, and then test it - than it does to hand assemble. So for the moment I am not overly concerned that I don't have a full assembler.
Modular code is the approach taken by Forth. You are encouraged to write short routines that only require a few input variables that are taken off the stack, and in turn will calculate some output result that is placed back on the stack. The stack is the all important communicating pipeline between the functional modules.
Whilst SIMPL is by no means anything like a full Forth, it does follow closely with some of the techniques used in the interpreter, but the dictionary that is fundamental to Forth is replace with a simple jump table. This makes it possible to have a working SIMPL kernel operating in fewer than 1000 bytes of code.
The Command Interpreter
Ward Cunningham who wrote Txtzyme, the precursor to SIMPL, described his interpreter as a switch-case statement contained within a loop.
I now need to devise an efficient switch statement mechanism for Suite-16, as this is central to the whole functioning of the command interpreter.
The switch statement is given an input value which it translates by a look-up table mechanism to an output value, and this output value is used as a jump address for the program execution.
Whilst there are 96 printable ascii codes to be used as commands, we are unlikely to have to use all of them in the jump table. First we can discount the numerical characters as these are handled separately by the number entry routing. Capital letters are reserved for User Functions or variables, so they will also be handled differently. That just leaves 26 lower case characters and 34 other symbols. The jump table has already reduced in size from 96 to 60 entries. It may be possible to reserve 60 words of the zeropage to accommodate the jump table, leaving 196 words for essential code, user variables and structures such as the data stack and return stack.
The jump mechanism needs some clarification. With Suite-16 we can embed an 8-bit jump address into the lower byte of the instruction. This however is very useful for accessing addresses on the zeropage, and we will need to find an alternative method to access the code words that are more likely to be located outside of page 0.
The jump table will contain a list of addresses, which are the start addresses for all of the command routines. For example if our accumulator currently holds the ascii character "p" 0x70 and we want to used this to invoke the printnum routine which for example starts at address 0x0100, we need to create a table in memory which at address 0x70 contains the value 0x0100. We can get this address back into the accumulator and then jump to it.
Suite -16 is currently only using an 8-bit jump address which is stored in the payload section of the instruction. If we extend this to a 16-bit jump, the target address will be held in the word following the jump instruction. We can use the accumulator to overwrite this target address, so we can effectively jump to an address that is held in the accumulator. This currently will have to be done in a two stage process, sometimes called a Trampoline Jump.
Let's assume that the accumulator holds 0x70 the letter p, and we want to jump to address 0x0100 that is held in the lookup table. We can use the indirect register addressing mode to access the table, using register R1 as a pointer. Our trampoline will be placed at locations 0x80 and 0x81
ST R0, R1 // R1 now contains 0x70 LD R0, 1 // R0 contains 0x0100 SET R1, 0x81 // The trampoline's target address location ST R0, 1 // store 0x0100 at location 0x81 JMP 0x80 // Jump to 0x80 where the trampoline jump instruction is located
This method is quite clunky and it takes 6 instructions to direct the program flow to the printnum routine.
It would be better if there was an easy way of doing a direct jump based on the contents of the accumulator, but with very little additional hardware overhead.
Fortunately we already have the means to modify the bottom 8-bits of the program counter as it is used by our call and branching methods. It should be relatively straightforward to adding a new instruction in the form of JMP @R0.
Our look-up code then becomes much simpler:
ST R0, R1 // R1 now contains 0x70 LD R0, 1 // R0 contains 0x0100 JMP 0 // Program jumps to address 0x0100