-
Suite-16 Reloaded
3 days ago • 0 commentsDirect progress on Suite-16 fizzled out in March of 2020, when most of the world experienced the start of the COVID-19 pandemic, and the various ensuing lockdowns.
Whilst the long periods of isolation and spare time at home may have appeared an ideal opportunity to work on long term personal projects, I chose to park up Suite-16 and engage in other studies.
Now, 5 years on, I found myself laid-up with a shoulder injury, unable to drive and with time on my hands to rethink and reboot the Suite-16 project.
In 2021, I teamed up with John Hardy and Craig Jones and created a formal implementation of the SIMPL language, re-written from scratch, renamed MINT (Minimum INTerpreter), and ported to the Z80, in about 1700 bytes of code.
MINT has also been ported to the 6502 by Alvaro G S Barcellos:
https://github.com/agsb/6502.MINT
Also a generic C++ version, called CMINT by Jason C J Tay.
https://github.com/trozodejamon/CMINT/tree/main
The Z80 version by John Hardy, Craig Jones and Ken Boak. Intended for TEC-1 Z80 SBC and RC2014 Z80 boards.
https://github.com/orgMINT/MINT
A Facebook group for MINT Computing:
https://www.facebook.com/groups/278238447530031
Most of my day to day writings are on the Minimalist Computing Group on Facebook:
https://www.facebook.com/groups/minimalistcomputing
In the next log, I will put forward a generic framework for a simplified MINT interpreter. -
SIMPL as a hardware "bring up" language.
10/28/2020 at 14:13 • 0 commentsBack in January, I described an interpreted language that I have been developing, for the Suite-16 TTL computer.
SIMPL is an acronym for Serially Interpreted Minimal Programming LanguageIt started life back in 2013 as a serial command interface to allow a microcontroller to respond to single character commands, sent to it over a serial terminal connection. At that time I was working on precision motion control systems, and during development it was useful to have a simple command interface to allow control over the motion hardware.
Commands were give easy to remember uppercase alpha characters, U Up, D Down, L Left, R right etc. Each command was preceded by a decimal number, which for the motion system, was the distance to move in millimetres.100 U would move the platform up by 100mm
This simple command shell was great for debugging the hardware, and rapidly evolved additional functionality to make it more useful.
Over a period of months, the test program evolved many more commands and started to become a bit unwieldly.
Further inspiration came in May of 2013, when I was introduced to Ward Cunningham's Txtzyme. This was a short C program, written to run on the Arduino, which introduced me to a more formal structure for a command interpreter based around a switch-case running within a loop.
As a language, it is a means of communicating with a processor and as a means for automating repetitive machine tasks. It has specifically been designed to be small and uncomplicated, requiring very few memory resources. Versions of it have been ported to other processors in less than 1000 bytes of program space.
Although the central interpreter routines and command despatcher are small in size, the basic framework is extensible and adaptable to a range of applications, the first of which is a tool-kit to allow easier programming and testing of hardware.
In this log I describe some of the desirable features and how they are implemented, to make this a useful tool for working with new processors.
Background.Working with any new processor is never an easy task, but having worked on a few over the years, you tend to learn a few techniques and build a tool-kit to help make the job easier. Once you have reached the stage that you can blink an LED or make a musical tone from an output pin, you have fought half the battle.
Almost every modern microcontroller is now programmed in C, or some scripting language. To make this possible you need the vast resources of a C-compiler and a large tool-chain running on a modern laptop. The Arduino Project has brought new skills to millions of new coders, who have then gone on to do great things with embedded or hobbyist programming. However, even the Arduino IDE has become bloated over the years, and it makes the coder highly reliant on vast libraries created by others.
This log hopes to illustrate some of the early methods of coding - dating back nearly 50 years, and how some of these techniques can be applied even today.
Assembly Language.
Writing in the native assembly language of any computer is not the easiest of tasks, time consuming to write and debug, and prone to mistakes. This often discourages people from attempting assembly language, or at least writing as little as possible. However, assembly language is the bedrock, or foundation layer of everything we do, and at some point it is necessary to gain an understanding of it's importance to the whole software stack that is built on top of it.
So if ultimately you cannot avoid assembly language, there are means of minimising your exposure to it. You need the bare bones of assembly language routines which will allow you to interact with your new processor, in an abstracted way that simplifies the tasks of low-level programming.
Early DaysIn the early days of microprocessors, there was few programming tools available, and many were initially programmed solely in machine code by toggling the contents into memory using the switches on the front panel. This was incredibly time consuming and error-ridden, so when serial terminals became more widely available, microcomputer manufacturers often provided a "serial monitor" program. This allowed the contents of memory to be examined and modified and programs to be run from RAM.
One of the more memorable monitors was "WozMon" that Steve Wozniak supplied for the Apple 1. It was compact and fitted into 256 bytes of PROM. You still has to do your one hand assembly using pencil and paper, but typing and viewing hex on a screen is a lot easier than toggling front panel switches.The next evolutionary step, was to provide an assembler, self hosted on the machine. This made assembly language programming a lot quicker, especially where labels and symbols could be use to define sub-routines and allow relocatable code to be written. However a fully symbolic 2-pass assembler was at that time a fairly complex piece of software, and CPU manufacturers often only bundled them with their expensive tool chains and development systems.
Early hobbyists could seldom afford these tools, and so efforts were made to find alternatives. Out of this period of late 1970's computer history came a range of interactive languages, tailored towards the resource limited microcomputers, including TinyBASIC, VTL-2, Mouse and Forth. VTL-2 (Very Tiny Language) was small enough to fit into 768 bytes.
Basic Machine InteractionInteracting on a one to one basis with a computer, in my opinion, is one of the purest forms of programming. It is a conversation between man and machine, where you issue commands, and the computer as your powerful servant follows your commands without questioning.
To command your machine, you need to give it concise, unambiguous and accurate instructions which it will execute. These are traditionally done using a text based language which is either interpreted directly or compiled into the computer's memory. Compiled code runs faster, but it has the disadvantage and overheads of the "edit compile test" cycle. Interpreted code is slower in execution time but has the advantage that the process is interactive and you can quickly test and reiterate until you achieve the desired functionality.The earliest electronic computers (1946-1952) were very simple machines and could not incorporate large instruction sets. The input and output was done by telegraphy paper tape and teletype, with limited character sets based on a 5-bit code. This let to small instruction sets with program commands restricted to uppercase instructions. The instructions were chosen to have a strong mnemonic value - such as A for ADD and S for SUB.
In the light of these simpler methods, I wanted to move away from native assembly language, and provide a rudimentary, interpreted language, based on single character commands, which can form the basis of a toolkit to make the job of code development somewhat easier.
Virtual Machines VMs
These Tiny Interpreted Languages, were frequently designed around a Virtual Machine, to be able to parse and interpret instructions stored in RAM. Using a VM running a higher level language makes the task of programming easier, but at the expense of speed of execution. For this reason the VM needs to be efficient in fetching instructions from memory and and executing them in the native machine language, Efficiency often means simplicity, so the interpreter must use all the fastest coding techniques to achieve this.
One approach is to use the minimum possible instruction set to implement the VM. This keeps the amount of native machine language required to a few hundred bytes, which makes coding it simpler, and it also makes it easier and quicker to transfer the VM from one processor ISA to another.
So what operations will form the primitive instructions of the VM. How many will be needed and typically what operations are essential to run the mechanics of the VM?
More complex instructions can always be synthesised by combining sequences of the primitive instructions.How to we make the VM language easier to work with?
I have found that I struggle to remember the mnemonics and syntax when I move from one assembly language to another, as one manufacturer choses one convention over an other. As processors get more complex, the quantity and complexity of each mnemonic statement increases. It has almost got to the point where assembly language for the largest processors has become virtually unreadable.
To simplify this ever increasing complexity, I propose the use of single printable ascii characters to represent each VM primitive instruction. As literate humans, we have learned to efficiently recognise a wide range of symbols, punctuation marks, and alpha-numerical characters.
For example we all recognise the aritmetical symbols from our basic math and algebra lessons:+ ADD - SUB * MUL / DIV
Then we can include the logical symbols:
& AND | OR ^ XOR ~ NOT
With just these 8 symbols we have covered almost all of the instructions performed by the ALU.
We can then add the comparison operators:
< LESS THAN = EQUAL TO > GREATEN THAN
The memory operation symbols are borrowed from the Forth language
@ FETCH ! STORE
And because the VM is going to be based around a stack machine architecture, we need the stack manipulation operations:
" DUP ' DROP $ SWAP % OVER
There are only a few remaining printable symbols and these are used for program flow control, allocating variables and defining structures
SPACE # LIT ( BEGIN ) END , PUSH . POP PRINT : CALL ; RETURN ? KEY INPUT [ OPEN ARRAY ] CLOSE ARRAY \ COMMENT _ TEXT STRING ` VAR { SWITCH-CASE OPEN } SWITCH-CASE CLOSE
We have defined the main constituents of a short-hand, but human readable language suitable for a stack based VM. It has its roots in Forth, not only with a much reduced word-set to define the operations, but with single ASCII characters it means that the text interpreter has been greatly simplified. Any character received in the input buffer is decoded using a jump table which supplies the code execution address.
For example, if the interpreter finds a + symbol, which is ascii character 2B, we use the value of 2B to index into a jump table, where there will be located a 16-bit start address of the code block that will handle the + operation. In this case it will ADD the top two numbers that it finds on the stack, and put the result on the top of the stack.
The text interpreter will then fetch the next symbol from memory, and perform a similar execution process. For each of the characters used in the tiny language, there will be a code block associated with it.
Numbers and Variables.
Having dealt with most of the printable symbols - there are three main classes of characters left, numerals 0-9, lowercase alpha a-z and uppercase alpha A-Z.The numerical characters are handled by a code routine called NUMBER. It takes each character in turn, until it finds a space or non-numeric character and creates a 16-bit number which places on the stack. It is the equivalent of an ascii-bin conversion. There is an equivalent numerical output routine PRINTNUM, which takes a number off the stack and prints it as a decimal number to the terminal. Further routines can be added to handle hexadecimal notation.
Having dealt with the numbers, we come to the alpha characters, which have traditionally been used in assembly language to denote sub-routine addresses in the form of labels, numerical constants and variables. Often the various registers of microprocessors are given a shortform name consisting of an uppercase letter. Being able to substitute a single character to represent a variable or constant has always been a powerful process in symbolic programming. In the case of tinyBASIC it was thus limited to 26 uppercase variables.
Subroutine labels are often denoted in lowercase. In SIMPL we will continue this tradition. Lowercase alpha characters will be assigned to some of the common tasks that would be required from a hardware bring-up language. With modern microcontrollers we want to exercise the GPIO ports, sample the ADC, define the timing of delays or square waves in terns of microseconds and milliseconds. The basic kernel of SIMPL can be extended by allocating the necessary code routines to these characters.
For a hardware bring up language, here is an example of the typical routines one would want to perform. I have suggested single character names - with a mnemonic value.
a set address for hex dump b set the number base c clear a range of memory d dump as hex e edit an address f fill memory range g Go - run code from and address h set an output high i define an input port j jump to a procedure address k a loop index variable l set an output low m milliseconds delay n output a binary number on a port o define output port p print in decimal q print in hex r read register s sample ADC t toggle port line u microseconds delay v assign a variable w write register x define x-axis position y define y-axis position z sleep until keypress or other event
a through to f are the typical commands that you would have on a hex editor or monitor program
h through to o are for exercising GPIO and defining loops and timing - useful for wave synthesis
p and q are for printing
r and w are for directly accessing a CPU register
x and y are useful for graphics routines or 2D motion control - such as CNC
Uppercase Commands.
These are what gives the language its flexibility and extensibility. They provide the mechanism for the programmer to create their own functions, and save them to memory.
For example, the following code will print a string of text to the screen, everything contained between the underscore characters is treated as a text string and is printed directly to the screen:
_This is a test message_
We can now assign this snippet of code to a user function, let's use M for message
:M_this is a test message_;
Each user function must be defined starting with a colon : The interpreter recognises the colon as the start of a user definition, defined by the next character, in this case M. It uses the ascii value of the M to create a unique address to store the remainder of the code snippet, until it reaches the trailing semicolon ;
To execute this function, you only have to type M.
User functions will often incorporate the in-built functions - such as the millisecond and microsecond delays m and u. We can incorporate these into a loop structure and use it to blink a LED, or generate audio tones.
The following code defines a scale of musical notes A,B,C,D,E,F,G\ Some fixed length "Musical Tones" :A40(1o1106u0o1106u); \ A 440.00 Hz :B45(1o986u0o986u); \ B 493.88 Hz :C51(1o929u0o929u); \ C 523.25 Hz :D57(1o825u0o825u); \ D 587.33 Hz :E64(1o733u0o733u); \ E 659.26 Hz :F72(1o690u0o691u); \ F 698.46 Hz :G81(1o613u0o613u); \ G 783.99 HZ
The backslash is used to state that everything following it until the newline character is a comment.
The brackets (parenthesis) are the means we define a loop function. Everything contained within the brackets will be repeated n times, where n is the number that immediately precedes the open bracket.
Looking at the definition for musical note A, assuming we have a small speaker connected to one pin of an output port.
1o will set an output port high and 0o will set it low.
So we set the output port high for 1106 microseconds and then set it low for 1106 microseconds.
We repeat this procedure 40 times, which will produce a short tone of 440Hz.
If we want to play 3 notes, all we need to do is type ABC, and if we want to play this sequence 5 times we can put it in a loop, and give it a new definition T, for "tune":
:T5(ABC)
If we don't like the tune - we can quickly edit the definition for T.
-
SIMPL Revisited
01/21/2020 at 20:20 • 2 commentsBack in late October, just as I was starting to hand-code some assembly routines for Suite-16, I considered porting my bytecode interpreter SIMPL across from MSP430 assembly language to that of the Suite-16.
Unfortunately at that time, the instruction set was very much in a state of flux and still evolving, and hand assembly was somewhat time-consuming. Revisiting this task now that we have an assembler and a hexloader in our toolchain armoury makes the job so much easier, and I have transcribed the SIMPL framework in a long afternoon sprint of about 6 hours coding. As well as the code, it is highly commented, thus documenting the workings of the SIMPL interpreter as I went along.
What is SIMPL?
At it's most basic level it is an interpreter consisting of a switch statement contained in a loop.
After all - that is how most simple processors and virtual machines are simulated. To make the job easier in assembly language - the switch statement is replaced with a jump table, with one 16-bit entry for each of the 96 printable ascii characters. In essence, when a command character is read from the input buffer, it is used to index into the jump table and pick up a 16-bit address for the code-block associated with that command.
SIMPL is stack based, so numbers are put onto the stack and operated on from there.
It is a tiny-Forth-like utility language without the complexity of the dictionary and text string matching that is needed in a full-blown Forth.
So the commands include familiar mathematical and logical symbols such as + - * and /, which obviously relate to arithmetical operations.
Then there are the stack operations - familiar to those using Forth, such as DUP, DROP, SWAP and OVER. In addition there are commands associated with decimal and hexadecimal number entry and also output routines to a serial terminal for decimal, hexadecimal and hex-dump formats.
The user has 26 commands available - that can be user defined and customised. These are represented by the uppercase characters A to Z. The language is extensible just by assigning a user command to a snippet of instructions.
SIMPL provides a minimal framework, consisting of serial input and output, number conversion, text interpreter, command allocator plus a range of arithmetical, logical, stack and memory operations. From this collection of built in routines, elementary applications can be assembled.
For example, if you were developing an application for a hex monitor, to view the contents of memory on screen in hex, edit it and save it back to memory you would assign characters for all the commands that you would need.
A set the Address
C Clear memory
D Dump memory
E Execute the routine
Alternatively if you were developing an application to control a CNC machine, plotter, 3D printer, robot etc you would probably use the X, Y, and Z characters that define the co-ordinates you wish the machine head to move. The SIMPL interpreter can read these in one line at a time from an ascii text file to control the motion of the machine, or to draw graphics on a screen. This is similar to the manner in which G-Code is used to control CNC machine tools, or Gerber files are used to control a photo plotter for pcb manufacturing. These simple control tasks evolved in the 1960s, when processing power was very limited, so the application itself had to be kept non-complicated.
SIMPL is very compact.
The minimal interpreter framework with all the input and output utility routines is under 256 program words. The Jump table is a further 96 program words. To this you must add the action routines associated with all of the potential 96 commands - but these are often very small - each only a few instructions long.SIMPL is easy to tailor to your own application, as rudimentary or complex as you wish. If you want commands for floating point math routines then these can be added. Generally I use it for exercising new hardware - or in this case thrashing out any discrepencies in the Suite-16 instruction set.
I have placed the SIMPL framework in my github repository here: - it is as yet untried and probably buggy - but gives an idea what can be done with a couple of hundred words of assembly language.
-
A New Year - and 2020 Vision.
01/13/2020 at 16:20 • 2 commentsIn mid-November, I traveled out to Northern California, to attend Forth Day at Stanford University - and to meet one of my Computing Heroes, Charles H. Moore - the inventor of Forth. Returning in late November, I got stuck in a bit of a rut, plus a lot of other things were putting pressure on my free time - which meant that the Suite-16 project was put on the back-burner with no further progress. My project-colleague, Frank, is currently touring Australia and Tasmania for the duration of January, and so that gives me a two-week window of opportunity to take stock of the project so far and plan out the goals for 2020.
It was always my intention that Suite-16 would exist as a C-simulator, a computer implemented in standard TTL and as a verilog implementation to run on an FPGA as a soft-cpu. In discussions with Frank, we have decided that now we have a working simulator, the next step is to convert this into a soft-cpu developed and running on opensource FPGA hardware. When this task is done, I will have had a lot more experience writing in verilog, and Frank and I, will have stable target hardware platforms with which we can do real system development. The TTL processor will follow on later, as it will be a lot of hard work, but at least the architecture will have been thoroughly explored and documented by that time.
The FPGA family I have chosen to use is the Lattice ICE 40 series. I first encountered these in early 2015 when Clifford Wolf announced his "Project IceStorm" reverse engineered, opensource FPGA toolchain - allowing the Lattice parts to be programmed using a manageable, open source toolchain. One of the first soft-cpus to benefit from this announcement was James Bowman's J1 Forth CPU. Video here: James had previously used Xilinx parts for earlier implementations of his J1, but by May 2015 had proven that it would easily fit onto a ICE 40 HX1K - which is a 1K lUT FPGA used on the $25 Lattice IceStick development board. James and I have collaborated on a couple of projects, having first met at the Open Hardware Summit in New York in September 2011.
In May 2016 a friend, Alan Wood, and I decided to develop an opensource FPGA dev-board based on the Lattice ICE40 HX4K. The result was the myStorm BlackIce, which is now into it's fifth generation. Hardware development took about 14 weeks from idea to the first 250 boards arriving from the manufacturers in Shenzhen, China. We debuted at the 2016 OSHCamp (Open Source Hardware Camp) held annually in Hebden Bridge, West Yorkshire, UK.
In November 2016, I met up with James again at Forth Day in Palo Alto, where he had implemented his J1 cpu on one of our BlackIce boards, and was serving his presentation as a series of jpgs from it. The BlackIce hardware had been fully exercised and proven robust and reliable - and makes an ideal platform for soft cpus or SoCs.
The 2nd generation BlackIce II board has a Lattice ICE 40 HX4K FPGA and 256K words of 16-bit, 10nS SRAM. Programming (configuration) is done over a USB connection, using an STM32L433 ARM Cortex M4 to act as the programming interface. We could have just used a FTDI FT2232H as a USB to SPI converter (like everyone else does) - but frankly at $3.50 the FTDI device is too expensive and the STM32 offers so much more flexibility for less money. Once the FPGA is programmed, the STM32 can be used to provide slave peripherals for the FPGA (such as ADC, DAC, UART, I2C, SPI etc). Later generation of myStorm boards have all adopted this ARM/FPGA symbiosis.
Implementing a soft-cpu in verilog
verilog, like vhdl is a popular hardware description language or HDL. verilog has its roots in C, whilst vhdl arose out of the US department of defense and is structured like Ada. Both are equally used, but verilog has my vote, because I know nothing about vhdl syntax.
When James Bowman transcribed his J1 cpu to verilog, he implemented it as a very neat switch-case (casez) statement in fewer than 110 lines of verilog in his J1 Github repository.
Using James's code as a useful tutorial example, it's fairly easy to see how an instruction set implemented as a switch-case statement in C, translates cleanly to verilog. His code is well commented and easily readable, nobody likes obfuscated code (apart from obfuscated code masochists).
This code is just the cpu, and the interface to memory implemented in internal BRAM. In a practical working system we will also require a module of code to define a serial communications UART, an external SRAM interface and if we are feeling adventurous, a VGA colour graphics interface and external frame buffer. For a complete system a PS/2 keyboard interface would also be useful - and this can be added as a PMOD - external hardware module.
-
PRINTHEX
11/05/2019 at 22:50 • 0 commentsPRINTHEX is probably the last of the utility routines that I needed to write in order to get a simple hex loader to run.
It accepts a 16-bit integer value from the accumulator R0 and prints it out as a 4-digit hexadecimal number. Leading zeros are not suppressed.
It is based on the decimal number print routine PRINTNUM, but with the added complication that the hex character sequence is not contiguous in the ascii table.
This is likely to be the last of the hand-assembled routines, because my motivation is from now on to use the TASM32 assembler - kindly customised for the Suite-16 instruction set by Frank Eggink.
Having a working hex loader with hex dump and simple monitor commands will be the next goal!
Here's just the PRINTHEX assembly code - it fits nicely into 48 words of memory:
// 0x0070 ---------------------------PRINTHEX----------------------------------------- // Prints out contents of R0 as a 4 digit hexadecimal number to the terminal // Leading zeroes are not suppressed // R1 = Heximation Value // R2 = digit // R3 = 0x30 // R4 = temporary storage for accumulator (Heximated value) // R6 = temporary store for output character 0x1200, // SET R2, 0x0000 0x0000, 0x1300, // SET R3, 0x0030 0x0030, 0x1100, // R1 = 4096 0x1000, 0x088C, // CALL Heximate 0x1100, // R1 = 256 0x0100, 0x088C, // CALL Heximate 0x1100, // R1 = 16 0x0010, 0x088C, // CALL Heximate 0x0A30, // ADI 0x30 to make a number 0x3600, // Store in R6 0x0B3A, // SBI 0x3A - is it bigger than ascii 9 // 0x0080 --------------------------------------------------------- 0x0284, // BLT 0x84 - Print decimal digit 0x0A41, // ADDI 0x41 - make it a hex digit 0x0C00, // putchar R0 0x0086, // BRA CRLF 0x2600, // LD from R6 0x0C00, // putchar R0 0x1000, // SET R0, CR 0x000D, 0x0C00, // putchar R0, CR 0x0B03, // SBI 0x03 Set R0, LF 0x0C00, // putchar R0, LF 0x0003, // BRA START // 0x008C ------------------------Heximate-------------------------------- 0xB100, // SUB R1, :Heximate 0x0290, // BLT 0x90 0xE200, // INC R2 0x008C, // BRA 0x08C // 0x0090 --------------------------------------------------------- 0x3400, // Store R0 in R4 temporary store the remainder 0x2200, // MOV R2, R0 get the count from R2 0x0A30, // ADI 0x30 to make a number 0x3600, // ST R0, R6 - temporary save to R6 0x0B3A, // SBI 0x3A - is it bigger than ascii 9 0x0299, // BLT 0x99 Print decimal digit 0x0A41, // ADI 0x41 - make it a hex digit 0x0C00, // putchar R0 0x009B, // BRA 0x9B Restore R0 0x2600, // Get R0 back from R6 0x0C00, // putchar R0 Print it as a decimal digit 0x2400, // Get R0 back from R4 0xA100, // ADD R1 adds DEC value to restore R0 0x1200, // SET R2,0 Reset R2 0x0000, 0x0900, // RET
-
Finalising the Instruction Set
11/04/2019 at 13:12 • 1 commentRecently, I have been exploring the Suite-16 instruction set, by the practical method of writing assembly language to run on the Suite-16 simulator.
Starting with a very simple routine to output "Hello World!", I have created routines for decimal and hexadecimal entry, decimal output and a very simple command interpreter.
In writing these routines, it became clear that there were certain deficiencies in the instruction set and over the last 2 weeks I have slowly added useful instructions to make the cpu more versatile.
This process is now approaching it's logical conclusion - partly because I have run out of spare instruction slots, and partly because I don't want to create such a complicated instruction set - that I don't stand a chance of implementing it in hardware.
Also I feel that after 2 weeks of spare time writing assembly routines it's time to move the project along to its next phase and begin the hardware implementation.
There are three main areas in which I believe the instruction set can be augmented.
The first is making more use of 16-bit immediate operations on the accumulator R0. To the ADI and SBI operations are 8-bit immediate operations where the operand is held in the lower 8 bits of the instruction register. Extending this to 16-bit will mean that the operand will be held in the next location in memory. This could be done by making the program counter another general purpose register - and this I believe is how the MSP430 implements immediate operations.
With this mechanism in place, ADD, SUB, AND, OR and XOR would benefit from having this 16-bit immediate mode.
My dealings with the decimal and hex routines have also highlighted the need for an efficient left shift on the accumulator.
Ideally I can implement as a bare minimum an ADD R0, R0, which will at least allow a doubling of the accumulator without involving any other register. The "Times 10" and "Times 16" routines used in decimal and hexadecimal entry would benefit from this instruction saving a few instruction cycles.
Secondly, I'm proposing that the 8-bit payload in the lower 8 bits of the instruction register can be used as an address to allow the IN and OUT operations to address up to 256 I/O devices. This is a placeholder for memory mapped I/O peripherals such as UARTs, timers and GPIO which can be added later.
Thirdly, the final instruction slot 0x0Fxx which is currently used as NOP. I intend to extend this to allow for microinstructions - inspired by the OPR, "OPeRate" instructions used on the PDP-8.
Plagiarising the PDP-8
The OPR instructions allow operations such as clearing and complementing the accumulator, setting and clearing the carry bit and shift and SWAP operations to be implemented.
The PDP-8 OPR instructions were implemented with the following individual bit-lines that operated directly on the hardware:
This scheme gives access to 8 individual control lines which could be sequenced to become active in a specific timeslot which allowed quite complex operations to be performed on the accumulator.
An alternative scheme is possible, where the lower 8-bit payload are fully decoded to allow up to 256 microinstructions. For maximum flexibility this could be done by using the byte to address a micro instruction ROM such as an additional AT27C1024. A 4-bit counter can be used on higher address lines to provide a primitive 16 step microsequencer. That leaves 4 address lines which could be used as inputs to implement a simple external interrupt system.
This would be very flexible but requiring more hardware, and probably quite limited by the access time of the AT27C1024 ROM.
The next plog (project log) will start to look at the hardware architecture and how we might implement a fast microinstruction sequencer using a counter, a diode matrix and some 3-8 line decoders.
-
Hexadecimal Entry
11/02/2019 at 20:29 • 0 commentsIt's been a bit of a slow week, and I must admit that I lost focus in the middle of the week with my hexadecimal number entry routine.
In my opinion, hexadecimal entry is more complex than decimal entry, because the characters 0-9, and A-F are discontinuous in the ASCII table.
Characters 0-9 need to have 0x30 subtracted, whilst characters A-F need to have 0x37 subtracted. Anything else is not a valid hex digit and can be ignored until a newline character is seen.
With each incoming character you have to check if it is a legitimate hexadecimal digit, and modify it, either by subtracting 0x30 or 0x37 to get it's true numerical value.
This test and modify is best done using a short subroutine - at the end of the listing
Once you have the numerical value allocated to the character the rest of the routine is similar to the decimal entry routine, except that you are multiplying by 16 rather than 10.
There's a further twist in the tail when you detect the end of the valid digits and have to add in the last digit - modifying it accordingly.
This first draft allows hexadecimal mumbers up to 0xFFFF to be entered and prints them back in decimal format.
You can find the latest listing for the Arduino in my Github Suite-16 repository
EDIT: I found some redundant code in the main GETHEX routine and have managed to shorten it from 52 to 34 words.
Further optimisation became possible with the test and modify subroutine approach.
As the instruction set currently lacks a shift left instruction, and it's not yet proven that ADD R0, R0 will be implemented in hardware, the routine to multiply the accumulator by 16 is a little cumbersome using eight instructions rather than a possible four.
EDIT: After proving the ADD R0, R0 instruction and fixing a minor bug the routine is now down to 31 words in length.
// 0x003C -----------------------------GETHEX---------------------------- // Accepts a hexadecimal number up to FFFF from terminal input buffer // converts it to an integer and puts it into register R0 // It can then be printed out as a decimal using PRINTNUM - for checking integrity // R1 is the pointer in the text buffer - which starts at 0x0200 // R4 is used as a temporary store for the character in the accumulator R0 // R5 is used in the "Times 16" routine // R7 is used to accumulate the powers of 16 when forming the integer in R0 0x1100, // SET R1, 0x0200 text buffer start 0x0200, 0x1700, // Don't forget to clear R7 0x0000, // 0x0040-------------------------------------------------------------------------------------- 0x4100, // LD AC, @R1 get first character from buffer :Getchar 0x3400, // Store R0 in R4 0xE100, // INC R1 0x4100, // LD AC, @R1 get next character - and test to see if it's a number or hex digit or space newline etc 0x0B30, // Subtract 0x30 Is it bigger than 0x30? 0x0250, // BLT 0x50 Quit No - so must be a space or newline etc 0x0B17, // SBI 0x17 is it bigger than 0x47 ascii for "F" ? 0x0350, // BGT 0x50 Quit Not a hexadecimal digit 0x0853, // CALL 0x0053 Restore, Test and Modify R0 0xA700, // Add in the accumulating total from R7 - ready to multiply 0xA000, // ADD R0, R0 Double R0 2X 0xA000, // ADD R0, R0 Double R0 4X 0xA000, // ADD R0, R0 Double R0 8X 0xA000, // ADD R0, R0 Double R0 16X 0x3700, // Store R0 in R7 R7 is the accumulating total of all the digits multiplied by powers of 16 0x0040, // BRA 0x0040 Get the next digit // 0x0050-------------------------------------------------------------------------------------- 0x0853, // CALL 0x0053 Restore, Test and modify R0 0xA700, // Add the accumulated sum from R7 - integer decimal number is now in R0 0x0010, // BRA 0x0010 Print it in decimal // 0x0053---------------------------------TEST R0 & MODIFY-------------------------------------- // If R0 = 0-9 subtract 0x30 to form a number 0-9 // If R0 = A-F subtract 0x37 to form a number 10-15 0x2400, // Get R0 back from R4 - we now have a hex character in the range 0-F and need to convert it to a value 0-15 0x0B40, // Subtract 0x40 Is it bigger than 0x40? Then subtract 0x37 else subtract 0x30 0x0258, // BLT Not A-F so subtract 30 and return 0x0A09, // ADI 0x09 (restores and corrects R0 to correct numerical value) 0x005A, // BRA Return 0x2400, // LD R0, R4 Get the character back in R0 - we know it's 0-9 0x0B30, // Subtract 0x30 0x0900, // RET 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP 0x0F00, // NOP // 0x0060 -------------------------------------------------------------------------------------
-
An Assembler for Suite-16
10/31/2019 at 13:01 • 0 commentsI was delighted to receive a Twitter notification from Frank Eggink, one of this project's followers, with news that he had created a table of Suite-16 instructions so that it can be used by TASM - a table driven assembler popular for small micros about 20 years ago.
His customised table and a link to the dowload site for TASM32 can be found at his Github repository: Here
Many thanks Frank - much appreciated your contribution!
I have re-jigged the instruction set slightly in the last few days - and the most recent can be found in this simulator file in my Github
With the changes to the instruction set, I now have no more empty slots, so the NOP at 0x0F00 seems a bit extravagant.
With an 8-bit immediate add to the accumulator, the NOP can be created from ADI 0.
This frees up the 0x0Fxx slot for my proposed (PDP-8 like) OPR instructions including shifts, clears, complements and small constants.
The main changes are documented in the text header:
// A simple simulator for Suite-16 processor // Add and Subtract Immediate instructions ADI and SBI added at 0x0Axx and 0x0Bxx // IN moved to 0x0D00 // JP@ - Branch to the address held in the accumulator added at 0x0E00 /* Suite-16 Instructions Register OPS- 0n --- -- Non-Register Ops 1n SET Rn Constant (Set) Rn = @(PC+1) 2n LD Rn (Load) AC = Rn 3n ST Rn (Store) Rn = AC 4n LD @Rn (Load Indirect) AC = @Rn 5n ST @Rn (Store Indirect) @Rn = AC 6n POP @Rn Pop AC AC = @Rn Rn = Rn - 1 7n PUSH @Rn Push AC @Rn = AC Rn = Rn + 1 8n AND Rn (AND) AC = AC & Rn 9n OR Rn (OR) AC = AC | Rn An ADD Rn (Add) AC = AC + Rn Bn SUB Rn (Sub) AC = AC - Rn Cn INV Rn (Invert) Rn = ~Rn Dn DCR Rn (Decrement) Rn = Rn - 1 En INR Rn (Increment) Rn = Rn + 1 Fn XOR Rn (XOR) AC = AC ^ Rn Non-register OPS- 00 BRA Always Target = IR7:0 01 BGT AC>0 Target = IR7:0 02 BLT AC<0 Target = IR7:0 03 BGE AC>=0 Target = IR7:0 04 BLE AC<=0 Target = IR7:0 05 BNE AC!=0 Target = IR7:0 06 BEQ AC=0 Target = IR7:0 07 JMP 16-bit Target = @(PC+1) 08 CALL 16-bit Target = @(PC+1) 09 RET Return 0A ADI Add 8-bit Immediate Immediate = IR7:0 0B SBI Subtract 8-bit Immediate Immediate = IR7:0 0C OUT putchar(AC) 0D IN AC = getchar() 0E JP@ BRA (R0) 0F NOP AC &= AC */
-
Immediate Instructions
10/29/2019 at 13:14 • 0 commentsOne of the deficiencies with the Suit-16 instruction set was a lack of an immediate addressing mode, where one of the operands is contained in the next word in memory.
This specifically was becoming a problem when you wanted to check if the contents of the accumulator lay between two bounds - and branch accordingly. This type of test is frequently found in ascii to hex or decimal conversion routines and string handling, and after coding a few routines it became obvious that the current situation involving other registers was clumsy and inadequate.
As a compromise I have added two instructions ADI and SBI which allow an 8-bit value to be coded into the payload area and have it added to or subtracted from the accumulator.
I have coded these two instructions in the spare 0x0Axx and 0x0Bxx instruction slots to try them out and see if they make coding easier and less convoluted. If they are useful they will get added to the final instruction set that will be implemented in hardware.
Here's an example from the Number entry routine where the input character needs to be tested to find out if it falls between ASCII 0x30 and 0x39 and is therefore a decimal digit. Registers R2 and R3 are first preloaded with the constants 0x0A and 0x30 so that they are available for the tests. These preload instructions will not be needed, saving 4 words of memory, and the SUB R3 and SUB R2 instructions become SBI 0x30 and SBI 0x0A respectively.
Whilst this might seem a trivial change in this example, it will be very useful when testing the input buffer for certain known strings - essential for dealing with high-level languages with keywords.
0x1300, // SET R3 0x30 Preload R3 with decimal 48 0x0030, 0x1200, // SET R2, 0x0A Preload R2 with decimal 10 0x000A, 0x1100, // SET R1, 0x0200 text buffer start 0x0200, 0x4100, // LD AC, @R1 get first character from buffer 0x3400, // Store R0 in R4 0xE100, // INC R1 0x4100, // LD AC, @R1 get next character - and test to see if it is a number 0xB300, // Subtract R3 Is it bigger than 0x30? 0x025A, // BLT Not a Number 0xB200, // Subtract R2 0x0A 0x035A, // BGE Not a Number 0x2400, // Get original character R0 back from R4 0xB300, // Subtract R3 to form a digit
-
Benchmarking Suite-16
10/28/2019 at 13:46 • 0 commentsOver the last week I have been running Suite-16 assembly language simulated in about 60 lines of C++ code. I have evolved the simulator over that time, and added some new instructions where it became necessary to use them.
The simulator has been written using the Arduino IDE - so that anyone with an Arduino compatible board can explore the code and learn how a very simple cpu simulator works.
Originally I had been simulating the Suite-16 cpu on an MSP430 Launchpad board with FRAM.
I noticed that despite it being a 16-bit processor, the performance was not so good, so I have swapped over to a Nucleo STM32H743 board which has a 400MHz ARM processor.
I'm still using the Arduino IDE to develop code - because it has a useful timing function micros() which returns the number of microseconds since the program was started. With this I can get fairly accurate timing information from my simulator.
I have used one of the spare instruction opcodes to allow the instruction count and the elapsed time to be output to the terminal
By way of a timing benchmark, I have set up a simple loop that loads R0 with 32767 and repeatedly decrements it until it reaches zero. I then print out instruction count and elapsed number of microseconds.
Based on the "count down from 32767" loop, my Suite16 simulator is running about 8 million simulated instructions per second.
That's about 66% of what I'm hoping the TTL cpu to run at.
Based on the 400MHz clock on the Nucleo board, I can estimate that the simulator in C is taking about 50 ARM instructions to execute a Suite-16 simulated one.
I tried exactly the same code on the MSP430 which is a nominal 16MHz. Unfortunately the FRAM only works at 8MHz with wait states, so that slows it down considerably to about 75,000 simulated instructions per second.
So I tried a 16MHz Arduino with an 8-bit AVR ATmega328 and the results were much improved to nearly 139,000 instructions per second.
The humble AVR is approximately 59 times slower than the ARM, but with a 7uS simulated instruction cycle it is still in the same league as some of the classic minicomputers from the 1960s.Update 31-3-2021.
I am now running the simulator on a 600MHz Teensy 4.0 dev board.
An empy loop executes at around 50 million iterations per second and an addition, subtraction or logic operation can be performed around 9.2 million times per second.
I have decided that the ISA of Suite-16 is a very good match for a stack-based bytecode language called STABLE by Sandor Schneider.