• ### Updating the Stack Manipulation

03/31/2021 at 12:53 0 comments

Inspired by Sandor Schneider's "STABLE" project I decided that it was time to update the code that handles the stack manipulations, particularly important for arithmetic, logical and comparison operations.

I found that a 4 level stack based on registers was not only limited but difficult to manage without a lot of time wasted shuffling registers.

Using Sandor's approach, there is an array in memory st[ ] which is accessed by a stack pointer variable s.  The top item on the stack is st[s] and the next item on the stack is st[s-1].

The only operations needed now to handle the stack are pre-increment or post-decrement the stack pointer using s++  or s--.

Here are the arithmetical instructions recoded to use this technique:

```//---------------------------------------------------------------------
//  Arithmetical, Logical and Comparison operations:

case '+':  st[s-1]+=st[s]; s-- ;        break;    // ADD
case '-':  st[s-1]-=st[s]; s-- ;        break;    // SUB
case '*':  st[s-1]*=st[s]; s-- ;        break;    // MUL
case '/':  st[s-1]/=st[s]; s-- ;        break;    // DIV
case '_':  st[s]=-st[s];                break;    // NEG

case '&':  st[s-1]&=st[s]; s-- ;        break;    // AND
case '|':  st[s-1]|=st[s]; s-- ;        break;    // OR
case '^':  st[s-1]^=st[s]; s-- ;        break;    // XOR
case '~':  st[s]= ~st[s];               break;    // NOT

case '<':  if(st[s]> st[s-1]){st[s]=-1;}else{st[s]=0;}  break;    // LT
case '>':  if(st[s]< st[s-1]){st[s]=-1;}else{st[s]=0;}  break;    // GT
case '=':  if(st[s]==st[s-1]){st[s]=-1;}else{st[s]=0;}  break;    // EQ```

SIMPL continues to be a world of exploration.

When running on the 600MHz Teensy 4.0 it has lightning speed.

50 million empty loops per second

9.2 million 32-bit addition or subtractions per second

• ### SIMPL - Implemented on a high performance Teensy 4.0

03/28/2021 at 23:16 0 comments

SIMPL implemented on the Teensy 4.0

The Teensy 4.0 is a small and very fast ARM Cortex M7 based microcontroller development board in a compact breadboard friendly package of 24 pins on an 0.6" pitch.

Like any other Arduino IDE compatible microcontroller it can be programmed with SIMPL using just the Arduino IDE.

Compared to the ATmega328, which was used on the Duemillenove and the UNO, Teensy 4.0 is about 300 times faster.

It can do 32-bit math, it has a dual execution unit which almost doubles the rate of executing instructions and it has a large RAM.

All of these things make it and ideal and simple to use development board for "Breadboarding in Code"

SIMPL is written in about 300 lines of standard Arduino C++.

It implements a number of useful commands and it can be extended using the : and ; colon definition.

The 600MHz clock used on the Teensy 4 allows very rapid code execution.

An empty loop can execute in 18.334nS.  This means that if you have time to kill, you can do 54.5 million empty loops per second. The loop is performed in 11 cycles of the 600MHz oscillator.

Even though SIMPL is written in C++, the high speed of the Teensy 4 means that a lot of processing can be done - very quickly.

If you want to toggle a port pin using h and l - the Teensy 4 will allow a 7.5MHz toggle frequency. With direct writing to the I/O registers, instead of using the Arduino digitalWrite function, this frequency could be increased many times over.

• ### SIMPL on the Arduino

01/23/2021 at 16:28 2 comments

Whilst I have discussed some of the wider aspects of SIMPL , I thought that it might be beneficial to take it back to its roots, as a simple sketch running on an Arduino Uno.

SIMPL was inspired by Ward Cunningham's Txtzyme and it was his compact interpreter that provides an essential part of the evolution.

Ward Cunningham's Txtzyme made good use of the lowercase alpha characters as a series of inbuilt commands designed to allow the hardware peripherals to be exercised.

This has been well documented in Ward's Txtzyme Github so won't be repeated here, needless to say that SIMPL has incorporated these commands and uses the same convention of lowercase alpha characters for inbuilt hardware functions.

Deconstructing SIMPL

SIMPL can be broken down into a few basic routines:

Read a character from RAM                                                      txtRead

Compare against colon character                                            txtChk

Look up the character and execute associated code block  txtEval

These 3 routines are enclosed in a tight loop and provide the Read-Evaluate-Print Loop

In Arduino the code is as follows:

```void loop()                // This is the endless while loop which implements the interpreter

{
txtRead(buf, 64);        // Get the next "instruction" character from the buffer
txtChk(buf);             // check if it is a : character for beginning a colon definition
txtEval(buf);            // evaluate and execute the instruction
}```

Supporting these routines, are four others for handling the serial input and output and numerical conversion

u_getchar       Get a character from the UART

u_putchar       Send a character to the UART

number           Read in a decimal numeric string and convert it into a 16-bit integer stored into a variable x

printnum16      Print out the 16-bit decimal integer stored in x via the UART

With just these 7 routines you have the fundamental building blocks to build the SIMPL kernel, and with this kernel create a framework from which the remainder of the application can be built.

To keep the kernel codesize to a minimum, I have avoided the Arduino serial library. Instead I have used very compact UART code borrowed from AVR Freaks.

As such, the 16-bit kernel fits into 1500 bytes.  With a change to printnum16 it can be modified to accept 32-bit integers. This pushes the code up to 1688 bytes.

I have placed the 32-bit SIMPL kernel code in my SIMPL  Github repository.

Extending the Kernel

At this point, the kernel is very basic, - purposefully with very few functions.

• You can enter a 32-bit decimal number, which is stored in a variable x  eg.   1234567890 enter
• You can enter a 2nd decimal number separated by a space and it will be stored in variable y eg.  123456789 54321 enter
• You can print out the value of x as a 32-bit decimal number using the p command  9876543210p
• You can assign a 32-bit number to any of the user functions A-Z, using the colon command  :A2468013579 and print it later using the p command  Ap
• You can use the ? command to list all of the User Commands A-Z and see if any value or code has been allocated to them.

Maths Operations

The obvious extension to the kernel is to add the maths functions, addition, subtraction, multiplication and division.

It is necessary to provide the means for a second parameter, y.  Txtzyme only allows one parameter x.

Providing a second variable to hold  the y parameter makes arithmetic and logic operations possible.

When a number is entered, it automatically is placed in the x variable. By inserting an ascii space invokes a transfer of the first number from x into y, and allows a second number to be entered.

123 456    this puts 123 into x, transfers it to y and then places 456 into x.

We can then add the operator to make addition possible:

123 456+p   ADD 123 to 456 and print it out as a decimal number.

We can now add the four common maths operations to the code.

``` case '+':      x = x+y;      break;     //  ADD
case '-':      x = x-y;      break;     //  SUB
case '*':      x = x*y;      break;     //  MUL
case '/':      x = x/y;      break;     //  DIV```

This effectively forms the basis of a very simple 4 function calculator, but the printnum routine will have to be modified to handle negative numbers.  Adding these 4 functions pushes the codesize up to 1940 bytes.

So building upon the basic numerical input and output routines we add the maths functions and this makes simple calculations possible.

We can also add the logic operations:  AND, OR, XOR and INV:

```case '&':      x = x&y;      break;     //  AND
case '|':      x = x|y;      break;     //  OR
case '^':      x = x^y;      break;     //  XOR
case '~':      x = ~x ;      break;     //  INV```

Looping

One of the principle functions borrowed from Txtzyme is the looping structure.

This introduces a new parameter k which is equal to the loop index counter. k is used to initiate the loop function and is decremented each time the interpreter executes the loop.

The loop terminates  when k=0.  The code to be executed in the loop is contained within curly braces {...........}. If the loop counter k is set to zero, the code within the braces will never be executed - and this can be used to create inline comments, eg.    0{This is a comment}

Adding the looping control code increases the codesize to 2124 bytes

```// Looping and program control group

case '{':
k = x;
loop = buf;
while ((ch = *buf++) && ch != '}') {
}

case '}':
if (k) {
k--;
buf = loop;
}
break;

case 'k':      x = k;      break;```

SIMPL follows a subtley different approach to Txtzyme, whilst retaining basic functional compatibility.  SIMPL has been designed to provide an executable symbolic language for virtual machines - as an alternative to their native assembly language.

With arithmetic commands and the loop structure that allows conditional flow there is the basis of a simple virtual machine instruction set.

In the next log we will look at the options for implementing a virtual machine.

` `
• ### SIMPL as an Instruction Set

01/16/2021 at 22:26 0 comments

As stated earlier, SIMPL will run on a microcontroller or other processor device using an interpreter which forms the basis of a Virtual Machine, which I have chosen to call a SIMPL Machine.

The SIMPL Machine uses printable ascii characters to create a concise, human readable language.

For convenience the printable ascii are divided into 4 sub-sets:

Numbers  0-9  A numerical string will be converted to a 16-bit integer in the range 0-65535

Lowercase  a-z   Lowercase alpha characters are generally used to call ROM based functions for printing and control of I/O

Uppercase A-Z  Uppercase alpha characters are generally used for user defined commands, variables and registers

Symbols     + - * /  etc.  These 33 symbols are used to define the primitive instructions and structures for the SIMPL Machine language.

It is the symbol and punctuation characters which will be used to define the Instruction Set, along the lines of the MISC machines developed by CH Ting and CH Moore.

The SIMPL instruction set is concise because each command is a single ascii character.

It takes the form of human readable shorthand which removes the need to memorise hex codes or processor specific instruction mnemonics. Traditional assembly language frequently uses mnemonics that are 3 characters long,  ADD, SUB, AND XOR.

Note: Although each instruction is represented by a single character. If however a more traditional listing were required it would be an easy programming exercise to substitute the character for a more conventional mnemonic.

SIMPL replaces these with single characters which minimises the text and vastly simplifies the overheads of parsing multi character text.

Source code is much more compact than traditional assembly language.  It can be loaded directly into RAM - either by typing or using the Send File feature of most terminal emulator programs.

SIMPL could become a lingua franca for microcontrollers or microprocessors.

Provided that the SIMPL VM was coded into the target cpu, whether AVR, ARM, Z80 or 6502 etc. the same SIMPL source code will run on any of the machines.  It forms a universal assemby language and could be used to replace cpu specific mnemonics and assembly listings.

Instruction Primitives.

These are allocated to the 33 printable ascii symbols.  They provide arithmetic, logic and comparison operations as well as memory and register access, looping and program flow control.

Using the concept of a Minimal Instruction Set Computer  (MISC) a complete machine can be created with fewer than 32 primitive instructions.

There have been several historical machines that illustrate this concept - such as the PDP-8, the MSP430, Marcel van Kervinck's Gigatron TTL computer and several of the Forth cpus by Charles H. Moore.

The SIMPL VM is based on a stack machine with a 4 level circular stack. Most operations operate on the top 2 elements of the stack.

Arithmetical and Logical Operations

+       ADD

-        SUB

*        MUL (Left Shift)

/        DIV   (Right Shift)

&       AND

|         OR

^        XOR

~        NOT

`         INCREMENT

Memory Access

@       FETCH

!         STORE

Stack Manipulation

"         DUP

'          DROP

\$         SWAP

%        OVER

SPACE - used to PUSH consecutive numbers onto the stack

Conditional / Comparison

<         LESS

>         GREATER

=         EQUAL

Input / Output

.        EMIT CHARACTER

?       INPUT CHARACTER

_       TEXT STRING

#       LITERAL

\        COMMENT

The remainder of the symbol instructions are used to control program flow and create structures

{        BEGIN LOOP

}        END LOOP

[        BEGIN CASE

]        END CASE

(        BEGIN ARRAY

)        END ARRAY

:         BEGIN COLON DEFINITION

;         END COLON DEFINITION

,        ARRAY SEPARATOR

• ### More about the SIMPL Project

01/15/2021 at 15:06 0 comments

SIMPL - is the acronym for Serial Interpreted Minimal Programming Language.

SIMPL is an extensible language allowing it to grow to suit the requirements of the application.

In this project:

• I will explain the programming concepts and the minimalist philosophy behind SIMPL.
• I will show how the kernel is built from a few basic routines
• There will be example code given for implementations on Arduino and MSP430
• I will investigate how SIMPL can be ported to custom CPUs, existing as a simulation or as a soft-core implemented on an FPGA
• I will explore the concept of the SIMPL Machine - a CPU architecture optimised for running SIMPL as it's native instruction set.
• I will show how the SIMPL framework can be used as the basis for several applications.

A 2 minute Introduction to SIMPL.

The following brief example shows how SIMPL allows you to flash a blinky LED on an Arduino or similar. The Arduino is loaded with the SIMPL kernel and connects via the serial terminal, allowing commands, shown in bold,  to be typed. 13 d            First we identify that we want to use the LED attached to digital pin 13 using the d command.  1 o              The o command allows us to send either a LOW or HIGH to the selected pin 12 - 1o turns the LED on,  0o turns it off

1000 m      We use the m command to initiate a millisecond delay in this case 1000 mS

10{...........}   We  create a loop using curly braces - in this case the contents of the braces will be repeated 10 times Now we put it all together as a microscript

13 d 10 {1o 1000m 0o 1000m}   This will flash the LED 10 times ON for 1 second and OFF for 1 second

13d10{1o1000m0o1000m}         But the spaces were only for clarity - they can be omitted. We can edit the delay periods to create different on and off times: 13d10{1o200m0o100m}            200mS on and 100mS off Or we can alternatively use the microsecond delay command u to generate audio tones into a small speaker

13d1000{1o500u0o500u}    This will generate 1 second of 1kHz tone But SIMPL is extensible - if we like the 1kHz beep, we can allocate a user command to it -  for example B for Beep

We do use this using colon : and semicolon ; to define our new command

:B13d1000{1o500u0o500u};

Every time B is typed a tone will be generated.

BBBB    Generate 4 seconds of tone

Now create a higher tone and call it C

:C13d1200{1o400u0o400u};

CBCBCBCB  Generate an alternating siren sound.

We have created a sound effect using a few numerical parameters  and a few ascii characters as commands  d  o  m  u  {  } :  and  ;  Three conventions are used: Lowercase characters a-z are used for pre-programmed commands that are generally coded into the Flash ROM

Uppercase characters A-Z are generally used for User Defined commands (microscripts) or variables that are stored in RAM

Punctuation characters and other symbols are used for arithmetic and logical operations and program flow such as looping.

How SIMPL Works

A small program flashed into ROM provides the serial interface and command interpreter.

Every ascii character read in, causes the interpreter to jump to a unique block of code which executes the command action before returning to the interpreter to read in the next character.

This very simple interpreter has amazing flexibility and can be tailored to suit the requirements of a wide variety of applications such as CNC (plotters, 3D printers, routers, laser cutters), robotics, hardware control etc.

The microscripts can be extended and concatenated to create a fully interactive, self-hosted programming environment - an alternative to assembly language, or higher level languages such as BASIC or Forth.

About SIMPL SIMPL is a minimal serial interface to a microcontroller to allow a self-hosted interactive programming environment. The microcontroller only needs a UART or bit-banged serial interface and responds to serial commands, and responds with serial output or direct control of the microcontroller's peripherals. SIMPL has been designed to use the minimum of processor resources in its implementation. If written in assembly language, only 2K bytes of Flash ROM and 512 bytes of RAM are needed for a minimum system.

It has been ported to ATmega328 via Arduino and MSP430 using Energia. Assembly language versions have been produced for various MSP430 devices using TI Code Composer tools. More recently it has been used to provide an interactive shell  for novel cpu designs that are simulated on ARM Cortex M7  devices such as the Nucleo STM32H743 evaluation board and the Teensy 4.x boards. SIMPL has a very small footprint, and can be quickly ported to new devices. Porting SIMPL to a new device in native assembly language is a great way to learn the ISA of the new device. SIMPL is toolkit with which to explore computing machines, both new and old.

SIMPL provides a learning exercise to understand the fundamentals of computer languages and virtual machines.

My involvement with SIMPL started in 2013 and it has been evolving ever since. Versions have been written for Arduino, MSP430 and ARM, both in C and assembly language.  Writing in hand assembled code will produce a solution that is 25% to 50% of the ROM space required by a program written in C and compiled with GCC. Further information about SIMPL may be found in my Github Repository SIMPL

Describing and documenting any software project from first principles is difficult at the best of times.

This is a first attempt to document the SIMPL project, to describe its concepts, its philosophy and implementation. Languages evolve over a period of time. Some ideas win whilst others fail. A language is only a snapshot illlustrating current thinking.

Beginnings.

SIMPL was inspired by the Txtzyme nano-interpreter by Ward Cunningham.  Ward created a small interpreter, written in around 100 lines of C code for use with Arduino or Teensy.

Txtzyme allowed direct manipulation of the microcontroller and peripherals using a compact set of just 13 commands entered via a serial interface. The commands could be accompanied with a single 16-bit number that was used as a control parameter. Txtzyme sequentially executed the commands stored in the serial input buffer until it encountered CR/LF characters. Control was then returned to the user. SIMPL extends the concept of Txtzyme in 5 main directions.

1.    It extends the number of commands to include all printable ascii characters, giving much greater functionality.
2.    It allowes the use of additional numerical parameters to allow arithmetic, comparison and logic operations to be performed
3.    It allowes new command definitions to be created by concatenating existing commands and assigning them a new name
4.    The txtEval function can be pointed to any area in memory, not just the input buffer, allowing the VM code to be executed from RAM
5.    Simple listing and editing features have been added.

The SIMPL Virtual Machine.

The SIMPL VM implements a 16-bit machine with a 64K word addressing space. It is based on the principles of a bytecode interpreter, where printable ascii characters are used as the bytecodes. This makes the language printable and viewable on a serial terminal. Wherever possible, ascii characters are chosen that have a strong mnemonic value attached to them. For example, the arithmetic operators:

+    ADD

-     SUB

*     MUL

/     DIV

It uses two main routines contained within a loop to implement the Read-Eval-Print Loop (REPL) interpreter.

In addition: It implements a serial interface using serial input  getchar()  and serial output  putchar() routines.

It provides an ascii to integer routine to process numerical character strings and convert them to 16-bit integers.

It uses an integer to ascii conversion routine printnum() which allows decimal integers to be sent to the serial terminal Optional  decimal to hex and hex to decimal routines allow for hex input and hex-dump output.

• ### Implementing the SIMPL Machine.

01/14/2021 at 22:19 0 comments

The previous log The SIMPL Machine, looked at how the J1 Forth CPU simulation could be used as the basis of a cpu targeted to execute the SIMPL language.

Here are some notes regarding the implementation on the Teensy 4.0

A switch-case structure will translate the SIMPL ascii character commands into 16-bit instructions to feed the J1 cpu simulated cpu.

I took the SIMPL text interpreter framework and mapped into it the J1 instructions, focusing initially on the stack, ALU and memory operations – listed below

```SIMPL               Operation                J1 hex code

”                   DUP                      6081

‘                   DROP                     6183

\$                   SWAP                     6180

%                   OVER                     6181

+                   ADD                      6203

&                   AND                      6303

|                   OR                       6403

^                   XOR                      6503

~                   INV                      6600

@                   FETCH                    6C00

!                   STORE                    6123```
• ### The SIMPL Machine

01/14/2021 at 12:28 0 comments

In the previous log it was identified that SIMPL would be hosted as a virtual machine running from ROM on the chosen microcontroller.

In this log I explore the practicalities of creating a simulated stack machine running on a Teensy 4.0 and programmed using the Arduino IDE.

The aim is to run SIMPL on a virtual machine with a minimum instruction set with fewer than 32 primitive instructions, in order to keep the complexity down.

As SIMPL is based around a 16-bit wordsize and a 16-bit address space, it will be a better fit to a machine that has a native 16-bit architechture. For this reason, much of the early exploration of SIMPL has been done on the MSP430 range of 16-bit microcontrollers, rather than the 8-bit AVR devices.

I stated in the last log that the SIMPL machine would be based on a stack architecture.

Unfortunately there are very few stack machines available as almost every modern microcontroller has a register based design. A large set of registers are almost essential for the efficient implementation of a high-level language such as C.

With no stack machines readily available, we need to create our own stack structures in software, either on an existing microcontroller,  as a simulation, or on a soft core design on an FPGA.

This provides 3 options which I wish to explore in turn.

1. Use an MSP430 16-bit microcontroller to implement the SIMPL machine
2. Simulate the SIMPL machine on a high performance 600MHz ARM Cortex M7 using a \$20 Teensy 4.0
3. Implement the SIMPL machine as a soft core on an FPGA using verilog.

Option 1  will be done using a low cost Launchpad development board. Fortunately Dr. ChenHanson Ting has written extensively about implementing his eForth system on an MSP430 so the mechanics of a stack machine have already been defined.

Option 2 makes use of the low cost Teensy 4.0 board as a target machine. The Teensy 4 with it's 600MHz clock can readily simulate many of the early microprocessors at many times their original operating speed.

Option 3. Teensy 4 may also be used to simulate experimantal cpus with custom instruction sets. One of these is James Bowman's J1 Forth cpu which might make a suitable candidate for the SIMPL machine, as it has already been implemented and proven in verilog on a Lattice ICE 40 FPGA.

As the MSP430 implementation has been covered elsewhere, and the MSP430 performance is somewhat limited by modern ARM standards, I intend to explore Options 2 and 3, with a software simulation of the J1 Forth cpu providing experience that will be directly relevant to the FPGA implementation.

The J1 Forth CPU

In 2010 James Bowman created a simple 16-bit stack machine that was targeted at executing the Forth language.  Since then it has been implemented in verilog and also as custom silicon that has found it's way into a number of graphics controller ICs by FTDI and their Singaporean silicon fabricators Bridgetek Pte.

The J1 is described in this J1 2010 Euroforth Paper

The J1 has a very compact instruction set, a minimal stack machine architechture  and can be described in fewer than 100 lines of C code or verilog. This makes it easy to understand and easy to implement on an FPGA using opensource tools.

The J1 instruction word is 16-bits wide and the individual bit fields operate directly on the hardware without an intermediate layer of instruction decoding.

However, memorising 16-bit instructions is not easy, and an assembler or high level language is essential for program development. This is where I believe that SIMPL can be employed to advantage, as a human readable pseudo-code that allows interactive programming of the J1 - as an alternative to assembly language or Forth.

The J1 Architecture.

Having read James Bowman’s J1 Paper several times over, I managed to build up a simplified model of his instruction set and architecture.

J1 uses 16 bit long instruction words – where each word is divided into different length fields.

The following 3 images – taken from James’s 2010 EuroForth paper – explain the architecture:

Looking at Table II above we see that it describes the J1 ALU operations. T is the Top element of the stack, and N is the second or Next element of the stack.

The ALU operations are applied to T and N.

Bits 15, 14 and 13  define the Instruction Class – and there are 5 classes of instruction

1 x x   Literal

000   Jump

001    Conditional Jump

010    Call

011     ALU instruction

Bit 12 if  set in ALU mode provides the return from subroutine mechanism by loading the top of the return address stack R into the PC.

Bits 11, 10, 9 and 8  define a 4 bit ALU opcode – allowing up to 16 arithmetical, logical and memory transfer instructions.

Bits 7, 6 ,5 and 4 are used to control the data multiplexers – so that data can be routed around the cpu according to which of these bits are set.  Here lies a little anomaly with the J1, in that Bit 4 is not used, and it would seem more logical to use it to provide the return function bit – currently done in bit 12.

Bits 3 and 2 define how the return stack pointer is manipulated in an instruction. It can be incremented or decremented by setting these bits.

Bits 1 and 0 define the manipulation of the data stack pointer  – it has a range of  +1,0,-1,-2 depending on the setting of these bits. To pop off the stack you subtract 1 from the stack pointer, to push on the stack, you add one to the stack pointer.

As the various control fields of the J1 instruction exercise different parts of hardware – they can operate in parallel – so for example a return or exit from subroutine can be had for free.

### Modifying the J1 Instruction Set.

Whilst the J1 instruction set is neat and compact – the unused bit anomaly in Bit 4 is a bit of a sticking point with me.

If we put the “return bit” from Bit 12 into the Bit 4 field, this would free up the Bit 12 field.

The instruction word will then break down into 4, 4-bit fields which makes it much easier to express as a hexidecimal number.

Bits 15, 14, 13, 12   Instruction Class

Bits 11, 10, 9, 8      ALU Operation

Bits 7, 6, 5, 4        Register transfers      T->N, T->R, N->[T], R->PC

Bits 3, 2, 1, 0        Stack Pointer modifications

Bit 12 becomes an augmented  version of the Jump instruction. It provides a mechanism to create look-up tables based on the ascii value of the SIMPL command character.

The J1 Simulator

The J1 cpu may be simulated in about 90 lines of C code.

```// J1 CPU Simulator by Samawati 27-3-2015
// https://github.com/samawati/j1eforth/blob/master/j1.c

static unsigned short t;
static unsigned short s;
static unsigned short d[0x20]; /* data stack */
static unsigned short r[0x20]; /* return stack */
static unsigned short pc;    /* program counter, counts cells */
static unsigned char dsp, rsp; /* point to top entry */
static unsigned short* memory; /* ram */
static int sx[4] = { 0, 1, -2, -1 }; /* 2-bit sign extension */

static void push(int v) // push v on the data stack
{
dsp = 0x1f & (dsp + 1);
d[dsp] = t;
t = v;
}

static int pop(void) // pop value from the data stack and return it
{
int v = t;
t = d[dsp];
dsp = 0x1f & (dsp - 1);
return v;
}

static void execute(int entrypoint)
{
int _pc, _t;
int insn = 0x4000 | entrypoint; // first insn: "call entrypoint"
pcapdev_init();
do {
_pc = pc + 1;
if (insn & 0x8000) { // literal
push(insn & 0x7fff);
} else {
int target = insn & 0x1fff;
switch (insn >> 13) {
case 0: // jump
_pc = target;
break;
case 1: // conditional jump
if (pop() == 0)
_pc = target;
break;
case 2: // call
rsp = 31 & (rsp + 1);
r[rsp] = _pc << 1;
_pc = target;
break;
case 3: // alu
if (insn & 0x1000) {/* r->pc */
_pc = r[rsp] >> 1;
}
s = d[dsp];
switch ((insn >> 8) & 0xf) {
case 0:   _t = t; break; /* noop */
case 1:   _t = s; break; /* copy */
case 2:   _t = t+s; break; /* + */
case 3:   _t = t&s; break; /* and */
case 4:   _t = t|s; break; /* or */
case 5:   _t = t^s; break; /* xor */
case 6:   _t = ~t; break; /* invert */
case 7:   _t = -(t==s); break; /* = */
case 8:   _t = -((signed short)s < (signed short)t); break; /* < */
case 9:   _t = s>>t; break; /* rshift */
case 0xa:  _t = t-1; break; /* 1- */
case 0xb:  _t = r[rsp];  break; /* r@ */
case 0xc:  _t = (t==0xf008)?eth_poll():(t==0xf001)?1:(t==0xf000)?getch():memory[t>>1]; break; /* @ */
case 0xd:  _t = s<> 2) & 3]); /* rstack+- */
if (insn & 0x80) /* t->s */
d[dsp] = t;
if (insn & 0x40) /* t->r */
r[rsp] = t;
if (insn & 0x20) /* s->[t] */
(t==0xf008)?eth_transmit(): (t==0xf002)?(rsp=0):(t==0xf000)?putch(s):(memory[t>>1]=s); /* ! */
t = _t;
break;
}
}
pc = _pc;
insn = memory[pc];
} while (1);
}
/* end of cpu */```

We can now combine this simulated J1 cpu with the SIMPL interpreter.  Each of the 16-bit wide primitive J1 instructions will be allocated to one of the SIMPL ascii character commands.

A crude version of this was hacked together in April of 2017 for the Arduino DUE or any of the larger memory Arduinos.

The intention now is to get thus running on a 600MHz Teensy 4.0 offering more memory and much greater speed

There will be the overheads of the J1 simulation and also the SIMPL virtual machine, but with the performance of the Teensy 4.0 it will be a good platform to explore the possibilities of a dedicated SIMPL machine.

• ### SIMPL: A subset of Forth

01/13/2021 at 15:19 0 comments

As SIMPL evolved it became clear that the Txtzyme command interpreter with only 13 basic commands for exercising peripheral hardware could be developed further into a virtual machine with its own tailored instruction set.

From an early stage in development it was decided that the VM instructions would be single character printable ascii codes. This was done initially to make the language more human readable, and was well suited the serial terminal interface.

The extensions to Txtzyme allowed SIMPL commands to be executed from RAM, and each of these commands had an accompanying command action stored as inline code in the Flash ROM.

This transition from instructions in RAM, executing blocks of code contained in ROM effectively released the microcontroller from the constraints of the Harvard architecture by using a Von Neumann virtual machine.

SIMPL and Forth

SIMPL has been greatly influenced by the Forth language developed by Charles H. Moore between 1958 and 1970. During the 1970s it was ported onto a wide range of minicomputers and microcomputers. It is a compact and extensible stack-based language.

The stack based machine is the obvious candidate for executing the Forth language as it matches the needs of the Forth machine model.

As SIMPL is a subset of Forth, it would seem sensible to make the SIMPL VM a stack based architecture.

Traditionally Forth was based around a 16-bit wordsize and a 16-bit addressing range that was commonplace amongst mini computers in the 1970s.

Chuck Moore and Dr. C.H. Ting worked together in the 1990s to create a minimal instruction set computer (MISC) which could efficiently execute Forth primitive instructions. Moore and Ting identified that Forth could be effectively synthesised from 32 or fewer primitive instructions.
Chen-Hanson Ting developed an eForth model, which was a formal specification for a Forth, implemented from a set of 32 primitives.

Chuck Moore took on the hardware design challenge of MISC Forth processors, and developed a series of designs characterised by small instruction sets and minimal hardware.

A full sized Forth implemantation would typically be around 6K bytes.

SIMPL proposes the use of a stack-based virtual machine, implementing a small number of primitive instructions which allow an extensible language to be created. It is suggested that the SIMPL machine can be inplemented in about  1 to 2K bytes of Flash memory depending on the target microcontroller.

Forth on an 8-Bit AVR

A tiny Forth with 28 primitives may be implemented in about 2K bytes of AVR assembly language. This implementation is by T. Nakagawa of Japan.

To implement Forth on an 8-bit AVR microcontroller requires an average of about 10 AVR assembly language instructions to implement each primitive.  There is also the overhead of the interpreter. This slows the execution speed down by a factor of 10 or 20. On the 20MHz AVR this corresponds to about 1 million Forth instructions per second (FIPS).