Close

Software

A project log for A SUBLEQ CPU

Yet Another Minimalistic One Instruction CPU SUBLEQ means SUBtraction jump on Less than or EQual to zero.

agpcooperagp.cooper 06/03/2017 at 10:340 Comments

Software

Today I looked at Mazonka's SUBLEQ wehpage (http://mazonka.com/subleq/) and his Higher SUBLEQ webpage (http://mazonka.com/subleq/hsq.html). Higher SUBLEQ is actually a C Compiler for SUBLEQ. His site also has an on-line C Compiler, Assembler and Interpreter (http://mazonka.com/subleq/online/hsqjs.cgi). The problem with Mazonka's work is it is a bit too advanced for me at the moment.

What I was really looking for was some code that I could have a look at. That I can manually step through so that I could get an idea how the language works. I came across Sandro Maffiodo "OI" webpage (http://www.assezeta.com/sandromaffiodo/oi/) which had what I wanted. His assembler syntax was very simple and yet does the job. Here is his "Hello World!":

(OI Assembler by Sandro Maffiodo
This example write hello world)
    H -2 .
    E -2 .
    L -2 .
    L -2 .
    O -2 .
    BLANK -2 .
    W -2 .
    O -2 .
    R -2 .
    L -2 .
    D -2 .
    BANG -2 -1
(ASCII characters)
    .H 72
    .E 69
    .L 76
    .O 79
    .BLANK 32
    .W 87
    .R 82
    .D 68 
    .BANG 33
So Let Have A Look

First comments start with a "(" and end with a ")".

Next the lines are for the form "A B C" (except for data) which are constants.

The SUBLEQ microcode is:

  1. subtract the value stored at address "A"
  2. from the value stored at address "B", and
  3. replace the result of the value stored at address "B".
  4. If the value stored at address "B" is less than or equal to zero then jump to address "C".

On the first code line, the "A" address is called the letter H and is a label for an address that the compiler has to work out. The actual value for H is stored at ".H" and no surprise, is the value 72 or ASCII "H".

The "-2" (or "B" address) is a special symbol for the address of the ASCII input/output port.

The "." is the jump address for the next instruction that the compiler has to work out.

Step Through

Move the value 72 or "H" (stored at at address H) to the ASCII output port (-2) then jump to the next instruction. Repeat for the remaining letter of "HELLO WORLD!".

The last letter "!" however, jumps to address "-1" which is understood to mean "halt".

Now was that so hard?

Assembler Code

Maffiodo's webpage has an assembler and an interpreter for download. He even codes a SUBLEQ interpreter in "SUBLEQ". Then he gets a compiled "C Code" SUBLEQ interpreter to run (interpret) a the "SUBLEQ" SUBLEQ interpreter that runs another "SUBLEQ" SUBLEQ interpreter that runs another "SUBLEQ" SUBLEQ interpreter that runs the "SUBLEQ" program "Hello".

To do this the "SUBLEQ" interpreter has to relocate the code each time in memory. Pretty smart! The SUBLEQ SUBLEQ interpreter only uses 348 memory locations!

Anyway I downloaded his assembler program and rebuilt it. My version is longer as Maffiodo uses many coding tricks to reduce his code size. I have never actually see code that uses the "," operator! My coding background is Pascal so I code with very tight typing. Maffiodo is very fluid in this regard. The program is actually quite simple:

  1. a tokenizer (that dumps comments)
  2. a label reference search
  3. a code emitter.

I want to add an inline macro section for macros like:

Anyway here is my current version of Maffiodo's assembler code:

// SUBLEQ  ASSEMBLER
// =================
//
// SYNTAX
// ------
//
// A space is required between tokens.
//
// SUBLEQ defined variables:
//   Z ( Zero, if used must be cleared after use )
//   T ( Temp, must be cleared before use )
//
// SUBLEQ defined constants (value should not be changed):
//   N ( Negative or -1 )
//   P ( Positive or +1 )
//
// Comments:
//     (      begin in-line comment
//     )      end in-line comment
//     ;      end of line comment
//
// Address Labels:
//   .        next address
//   ?        next address
//   label    label for address reference or variable
//   .label   variable (location) for label (compiler sets address)
//   label:   address reference (goto location) for label (compiler sets address)
//   @label   absolute (fixed) address for label (no code generated)
//
// Constants or Literals:
//   Digits   constant (example "1")
//   -?       constant (example "-1")
//   -?       literal  (example "-end")
//
// Examples of use:
//   ( HELLO WORLD! )
//   H OUTPUT ?
//   E OUTPUT ?
//   L OUTPUT ?
//   L OUTPUT ?
//   O OUTPUT ?
//   BLANK OUTPUT ?
//   W OUTPUT ?
//   O OUTPUT ?
//   R OUTPUT ?
//   L OUTPUT ?
//   D OUTPUT ?
//   BANG OUTPUT ?
//   Z Z HALT ; And end program
//   ( ASCII characters )
//   .H 72
//   .E 69
//   .L 76
//   .O 79
//   .BLANK 32
//   .W 87
//   .R 82
//   .D 68
//   .BANG 33
//   @OUTPUT -1 ( On my system )
//   @INPUT -2  ( On my system )
//   @HALT 0    ( SUBLEQ convention is -1 for halt but better to return to the monitor program )

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

int main()
{
  int i,j,k;
  bool inLineComment=false;
  bool endLineComment=false;
  bool absAddrFlag=false;
  struct absAddr {
    char name[16];
    int addr;
  };
  struct absAddr absAddrs[256];
  struct absAddr *a;
  int nAbsAddr;
  struct token {
    char tok[64];
    int addr;
    bool found;
  };
  struct token tokens[4096];
  struct token *t;
  struct token *ts;
  int nTokens;
  char *tptr;
  char line[64];

  // Load input (code) into token array
  t=tokens;
  nTokens=0;
  a=absAddrs;
  nAbsAddr=0;
  while (fgets(line,sizeof(line),stdin)!=NULL) {
    tptr=strtok(line," \t\r\n");
    if (tptr!=NULL) strcpy(t->tok,tptr);
    while (tptr!=NULL) {
      if (t->tok[0]==';') {
        // End of line Comment
        endLineComment=true;
      } else if (absAddrFlag) {
        a->addr=atoi(t->tok);
        a++;
        nAbsAddr++;
        absAddrFlag=false;
      } else if ((t->tok[0]=='@')&&(t->tok[1]!='\0')) {
        // Absolute address reference
        strcpy(a->name,t->tok+1);
        absAddrFlag=true;
      } else if (t->tok[0]=='(') {
        inLineComment=true;
        if (t->tok[strlen(t->tok)-1]==')') inLineComment=false;
      } else if (inLineComment) {
        if (t->tok[strlen(t->tok)-1]==')') inLineComment=false;
      } else {
        t->addr=-1;
        t->found=false;
        t++;
        nTokens++;
      }
      tptr=strtok(NULL," \t\r\n");
      // Test if end of line
      if (tptr==NULL) {
        // End of line processing
      } else {
        strcpy(t->tok,tptr);
      }
      // Test if end of line comment
      if (endLineComment) {
        endLineComment=false;
        tptr=NULL;
      }
    }
  }

  // Resolve labels
  t=tokens;
  k=0;
  for (i=0;i<nTokens;i++) {
    if (((t->tok[0]=='.')||(t->tok[0]=='?'))&&(t->tok[1]=='\0')) {
      // Next address "." or "?"
      t->addr=k+1;
      t->found=true;
    } else if ((t->tok[0]=='.')||(t->tok[strlen(t->tok)-1]==':')) {
      // Label ".label" or "label:"
      ts=tokens;
      for (j=0;j<nTokens;j++) {
        if (j!=i) {
          if ((t->tok[0]=='.')&&(strcmp(t->tok+1,ts->tok)==0)) {
            // Use nearest reference
            if (ts->found) {
              if (abs(ts->addr-i)>abs(j-i)) {
                ts->addr=k;
              }
            } else {
              ts->addr=k;
              ts->found=true;
            }
          } else if ((t->tok[strlen(t->tok)-1]==':')&&(strncmp(t->tok,ts->tok,strlen(t->tok)-1)==0)) {
            // Use nearest reference
            if (ts->found) {
              if (abs(ts->addr-i)>abs(j-i)) {
                ts->addr=k;
              }
            } else {
              ts->addr=k;
              ts->found=true;
            };
          }
        }
        ts++;
      }
      k--;
    }
    t++;
    k++;
  }

  // Search for absolute address
  t=tokens;
  for (i=0;i<nTokens;i++) {
    if (t->found==false) {
      a=absAddrs;
      for (j=0;j<nAbsAddr;j++) {
        if (strcmp(t->tok,a->name)==0) {
          t->addr=a->addr;
          t->found=true;
        }
        a++;
      }
    }
    t++;
  }

  // Emit SUBLEQ code
  t=tokens;
  j=0;
  for (i=0;i<nTokens;i++) {
    // Check if literal or constant
    if (t->tok[0]=='-') {
      printf("%s ", t->tok);
      j++;
      if (j%3==0) printf("\n");
    } else if (isdigit(t->tok[0])) {
      printf("%s ", t->tok);
      j++;
      if (j%3==0) printf("\n");
    } else {
      // Check if reference label
      if (t->tok[strlen(t->tok)-1]==':') {
        // No action, next token
      } else if ((t->tok[0]=='.')&&(t->tok[1]!='\0')) {
        // No action, next token
        j=-1; // Seperate line for each variable
      } else if (t->found==false) {
        // If reference not resolved - Error!
        fprintf(stderr, "Error: symbol %s is undefined\n", t->tok);
        return -1;
      } else {
        // Okay, export address
        printf("%d ", t->addr);
        j++;
        if (j%3==0) printf("\n");
      }
    }
    t++;
  }

  // End
  if (j%3!=0) printf("\n");
  return 0;
}

I have added some alternate syntaxes (i.e. "?" (equal to ".") and ":" ("label:" instead of ".label") as per Mazonka's assembler. I also made labels local (i.e. nearest reference).

SUBLEQ Underflow BUG

Its a bit like the FDiv bug, not important until it is! Basically for an 8 bit word, -(-128) equals -128 which is a problem for code that test the sign bit of a word for bit operations. For normal arithmetic operations underflow is a "fact of life" and not important.

The following code fails:

  1. T T ? (i.e. Zero T)
  2. A T Jump (i.e. Jump on A>=0)

Works fine except for 0x80 (-128)!

This code works (although the logic is inverted):

  1. T T ? (Move (copy) A to T)
  2. A Z ?
  3. Z T ?
  4. Z Z ?
  5. Subleq -1 T Jump (i.e. Jump on A<0)

SUBLEQ Opcodes

Now it is said that SUBLEQ is a one instruction language but it actually has two opcodes. The second opcode is HALT, accessed if the jump address (i.e. "C") is negative. For real CPUs the HALT instructions in not necessary (it is necessary to be Turing Complete but that is a theoretical concept). In my case I have used the negatives address space to access additional memory but for larger word sizes that is not important. Instead the high order bits could be used to specify the ALU action. For a 16 bit word:

or for a 32 bit word:

The modifications to the decoder would require the "C" address to be read before writing the "B" results.

Macros and Specifying Code Address for the Assemble

I may look at macro but easier to to "M4" the macro language as a pre-processor. (Still I may write my own!)

As IO and self-modifying code need to be in specific memory locations I will need to add a way to specify the memory address for code.

Assembler

The Assembler is now finished. Here is a modified version of "HELLO WORLD!" showing the main features:

@OUTPUT -1 ; On my system
@INPUT -2 ; On my system
@HALT 0 ; Return to monitor
( HELLO WORLD! )
  H OUTPUT ?
  E OUTPUT ?
  L OUTPUT ?
  L OUTPUT ?
  O OUTPUT ?
  BLANK OUTPUT ?
  W OUTPUT ?
  O OUTPUT ?
  R OUTPUT ?
  L OUTPUT ?
  D OUTPUT ?
  BANG OUTPUT ?
  Z Z HALT ; And end program
( ASCII characters )
  .H 72
  .E 69
  .L 76
  .O 79
  .BLANK 32
  .W 87
  .R 82
  .D 68 
  .BANG 33
( Predined variables and addresses )
.Z 0
.T 0
.P 1
.N -1
.SP -17

I have added an equate type command with the "@label". This allows the setting of absolute or fixed addresses. This is necessary to set the system INPUT/OUTPUT and HALT addresses. The @label is also necessary for relocatable code. Although the ".label" and "label:" are the same internally, I nominate the ".label" for variables and the "label:" address references. Upon output variables are one per line rather than in triplets (for SUBLEQ instructions). Refer to the output below:

39 -1 3 
40 -1 6 
41 -1 9 
41 -1 12 
42 -1 15 
43 -1 18 
44 -1 21 
42 -1 24 
45 -1 27 
41 -1 30 
46 -1 33 
47 -1 36 
48 48 0 
72 
69 
76 
79 
32 
87 
82 
68 
33 
0 
0 
1 
-1 
-17

Interpreter

I wrote an interpreter that closely models my 16 bit SUBLEQ design:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>

int main() {
    bool console;
    int mem[65536];
    int *i=mem;
    int IP=0;   // Instruction Pointer
    int AddrA=0;
    int AddrB=0;
    int AddrC=0;
    int DataA=-1;
    int DataB=-1;
    int Result=-1;

    // Test if console
    console=isatty(STDIN_FILENO);

    // Load Code
    printf("\nSUBLEQ Interpreter\n\n");
    if (console) {
        printf("Input mode (Enter '!' to quit):\n");
        // Exit on not a number
        // Note: <enter> will not exit
        while (scanf("%d",i++)==1);
        printf("Interpret mode:\n");
    } else {
        while (scanf("%d ",i++)!=EOF);
    }

    // Interpret Code
    do {
        // Fetch Addresses
        AddrA=mem[IP];
        IP++;
        AddrB=mem[IP];
        IP++;
        AddrC=mem[IP];
        IP++;

        // Fetch Data
        if ((AddrA<0)||(AddrA>=65528)) {
            // Fetch from Input (-2)
            if ((AddrA==-2)||(AddrA==65534)) {
                DataA=(int)getchar();
            } else {
                // Not mapped
                DataA=-1;
            }
        } else {
            // Fetch ROM/RAM
            DataA=mem[AddrA];
        }
        if ((AddrB<0)||(AddrA>=65528)) {
            // Fetch from Input (-2)
            if ((AddrB==-2)||(AddrA==65534)) {
                DataB=(int)getchar();
            } else {
                // Not mapped
                DataB=-1;
            }
        } else {
            // Fetch ROM/RAM
            DataB=mem[AddrB];
        }
        // Subtract
        Result=DataB-DataA;

        // Update IP
        if (Result<=0) {
            if (AddrC>=0) {
                IP=AddrC;
            } else {
                IP=65536-AddrC;
            }
        } else {
            IP++;
        }
        // Deposit
        if ((AddrB<0)||(AddrA>=65528)) {
            if ((AddrB==-1)||(AddrA==65535)) {
                // Deposit to Output (-1)
                // "Result" is inverted in hardware
                printf("%c",~((char)Result));
            } else {
                // Not mapped
            }
        } else {
            mem[AddrB]=Result;
        }
    } while (IP!=0);
    printf("\n\nInterpreter finished\n");
    return 0;
}

The code is "spelt out" to avoid coding errors. I also detect if the user is using the console or file redirection (the usual/expected mode).

An OPCode Level SubLEq Assembler

I came across a SUBLEQ assembler (https://github.com/farlepet/subleq_assembler) that accepts "higher level operations" but exports SubLEq microcode. Very neat idea, the OpCodes are:

The assembler appears to assume RAM resident code(?). He shows some quite high level examples with functions etc. but it will take some time for me to understand his work.


Regards AlanX

Discussions