Simple Compiler

This project is rather large and the work is not that linear. To make the logs readable I will from time to time rewrite them.

Project logs:

Convert SC.PAS to SimpleC.c(done).
Adjust the code for the CPU word width and memory model (done).
Build tool train(done):
- Subleq code generator (using the Simple Compiler OpCodes as input),
- Subleq Assembler, and
- Subleq Interpreter.
Modify the Simple Compiler to export OpCodes to suit OpCode2Subleq.
Link up and debug the tool chain (done):
- SimpleC -> OpCode2Subleq -> Subleq_Asm -> Subleq_Int

Mission Completed

I now have a working Subleq Compiler. Yes it is very primitive and very very slow but that can be made better.

Here is the test code ("1.SC"):

begin

  A=4
  B=1
  if A<B then
    write A*10<b
  else begin
    write 1<B*10
    write B*10, (a+b)*(-5)
  end
  C=-1
  write C
end

Here is the tool chain in action:

C:\AlanX\SimpleC\OpCode2Subleq>SimpleC  -o 1.opc 0<1.sc
Tokenised Code:
Line      1: BEGIN
Line      2: A = 4
Line      3: B = 1
Line      4: IF A < B THEN
Line      5: WRITE A * 10 < B
Line      6: ELSE BEGIN
Line      7: WRITE 1 < B * 10
Line      8: WRITE B * 10 , ( A + B ) * ( - 5 )
Line      9: END
Line     10: C = - 1
Line     11: WRITE C
Line     12: END
Done.

C:\AlanX\SimpleC\OpCode2Subleq>OpCode2Subleq -i 1.opc -o 1.sasm

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Asm -i 1.sasm -o 1.code -l 1.list

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Int -i 1.code -l 1.list -t 1.trace

SUBLEQ Interpreter

 1
 10 -25
 -1


Interpreter finished

C:\AlanX\SimpleC\OpCode2Subleq>pause
Press any key to continue . . .

Speed

The interpreter takes a full 4 seconds to execute! Hard to believe anything could be this slow.

Code Size

1.sc - 12 lines
1.opc - 74 lines including the header (95 words)
1.sasm - 3126 lines including comments (5530 words)

Improvements

Rather than fix/improve Simple Compiler it may be better to port a more complete compiler to work with OpCode2Subleq.

The tool train would suit a Transport Triggered Architecture (TTA) like my "Weird CPU" if the word size is increased from 8 bits to 12 bits or 16 bits.

Conclusion

Subleq is horribly inefficient.

PL/0

It is pretty clear that the root source of most of these simple compilers is Wirth's PL/0 as the language syntax (i.e. the expression part) is identical with the same names, omissions and unary minus fault. Unfortunately they did not carry through the rest of PL/0.

I have sourced Wirth's original PLZero Pascal source code which I will follow for upgrades to Simple Compiler.

Tokeniser

I have split the tokeniser out of Simple Compiler as it can be done and it reduces the code size of Simple Compiler.

String Tokens

I prefer working with string tokens (e.g. "IF" than working with integer tokens (e.g. _if) as I don't need a list of token definitions. So I will rework the code to suit.

PL/0 in C

Ported Wirth's PL/0 Pascal source code into C. Now I am a native Pascal programmer and Wirth was the Inventor of Pascal. So it is a bit sad that I ported his code to C.

PL/0 Extensions

The basic PL/0 is a bit primitive (no strings!). So I have added arrays and (Pascal) strings. It also recognise the majority of dialects I have seen. But no built in functions, or user functions or procedure parameter lists. I am not trying to clone a full version of Pascal, just a very basic compiler for home brew CPUs.

The compiler syntax error reporting needs a bit of work to be more helpful, and finally there are no run-time checks!

But overall I am very happy with this little compiler.

P-Code to Subleq

The PL/0 compiler exports "P-Code". This needs to be translated into Subleq. There are 26 Instructions to recognise. Few of the instructions are simple.

AlanX

Project Logs

Collapse

Code Generation

agp.cooper • 07/20/2017 at 11:06 • 0 comments

Code Generation

I have been testing some fairly big programs. Another grammar fault I found is that you cannot assign negative constants. Fixed.

Here is the grammar at the moment:

  PL/0 Grammar:
  program = block "." .
  block = [ "const" ident "=" ["-"] number {"," ident "=" number} ";"]        // Add negative constants
          [ "var" ident {"," ident} ";"]
          [ "var" ident "[" number "]" {"," ident "[" number "]"} ";"]        // Add arrays
          { "procedure" ident ";" block ";" } statement.
  statement = [ ident ":=" expression
                ident "[" expression "]" ":=" expression                      // Add arrays
                ident ":=" string                                             // Add string assignmnet to ident
                ident "[" expression "]" ":=" string                          // Add string assignmnet to arrays
              | "call" ident
              | "?" ident | "write" ident | "putc" ident                      // Add "write" (int) and "putc" (char)
              | "!" expression | "read" expression | "getc" expression        // Add "read" (int) and "getc" (char)
              | "begin" statement {";" statement } [";"] "end"                // Add optional ";" before "end"
              | "if" condition "then" statement [ "else" statment ]           // Add "else"
              | "while" condition "do" statement ].
  condition = "odd" expression |
              expression ("="|"#"||"!="||"<>"|"<"|"<="|">"|">=") expression . // Add "!+" and "<>"
  expression = [ "+"|"-"] term { ("+"|"-") term}.
  term = factor {("*"|"/"|"%"|"mod") factor}.                                 // Add "%" and "mod"
  factor = ident | ident "[" expression "]" | number | "(" expression ")".    // Add arrays
Add comments (to tokeniser):
  // ... EOL
  { ... }
  /* ... */
  (* ... *)

Still have not fixed the unary minus fault but its not really an issue, just put parentheses around it.

The above grammar may make you eyes roll but you need it to code a compiler.

The Plan

The plan for building the code generator is to rework the interpreter:

Change the stack to downward growth from upward growth (done).
Swap out the P-Code instructions for pseudo CPU OpCode (done).
Convert the interpreter into a OpCode lister (done).
Covert the opcodes to Subleq.

The reason for doing this to the interpreter is that changes can be incrementally checked.

The CPU I am modelling is as minimal as practical. The current pseudo OpCodes are stack focused. Here are the registers:

Registers	Comment
Ax	Primary Register
Bc	Secondary Register
Tx	Temporary Register (used for swap)
PC	Program Counter (?)
SP	Stack Pointer (downward growing)
BP	Base Pointer

The program counter (PC) is not actually a register but a subroutine return address. It is set by the compiler at compile time, that is the real PC register is not actually accessed during execution.

The stack and register pseudo opcodes:

Stack operations:	Register moves:
FromStack (Ax=stack[Ax]	mov Ax,Bx
ToStack (stack[Ax]=Bx)	mov Ax,SP
push Ax	mov SP,Ax
pop Ax	mov Ax,BP
pop Bx	mov BP,Ax
	mov PC,Ax
Move Immediate (literal):	mov Ax,PC
mov Bx,Imm	mov Tx,Ax
	mov Bx,Tx

I have favoured register moves over stack operations because stage operations are expensive. Here are the remaining pseudo opcodes:

Increment/Decrement:	Mathematical Operations:	Boolean Operations:
dec Ax	sub Ax,Bx	odd Ax
inc Ax	add Ax,Bx	eq Ax,Bx
	mul Ax,Bx	lt Ax,Bx
Jump:	div Ax,Bx	le Ax,Bx
jmp	mod Ax,Bx	ne Ax,Bx
jz		gt Ax,Bx
		ge Ax,Bx

Add four input/outputs instructions:

Input/Output:	Code:
read (integer)	read
get (char)	getc
write (integer)	write
put (char)	putc

Recoded the pseudo opcode interpreter into a code lister (done). Called it PCode2OpCode.

I have been reviewing the OpCode output and I have used indirect addressing with the Ax register which of course is not a valid instruction for the x86 instruction set. It is pseudo opcodes after all. I also changed the format to make it easier to import, for example I use "mov_Ax,Bx" rather than mov Ax, Bx" and I dropped the "L" prefix for address labels.

AlanX

PL/0 Completed?

agp.cooper • 07/19/2017 at 01:39 • 0 comments

PL/0 Completed?

Brave words! Can it really be completed? There are still things I would like to add but I need to move on. The current state of play is:

Recognises most PL/0 dialects.
Tolerant to redundant ";"
One dimensional arrays
Pascal strings

To do:

Would like to add an ASM instruction (pass through to the target CPU).

Rework the error messages to provide better hints.

Here is an example program:

{ Program Test String }
var Str[16],Prompt[16];

  procedure ReadStr;
  var i,test;
  { Read a Pascal string, uses Str }
  begin 
    i:=1;
    test:=1;
    while test=1 do begin
      getc Str[i];
      if Str[i]<32 then begin
        Str:=i-1; { Str == Str[0] }
        test:=0
      end else begin
        if i=16 then begin
          test:=0;
          Str:=i
        end else begin
          i:=i+1
        end
      end
    end
  end;

  procedure WriteStr;
  var i;
  { Write a string, uses Str }
  begin
    i:=1;
    while i<=Str do begin
      putc Str[i];
      i:=i+1
    end
  end;

  procedure WritePrompt;
  var i;
  { Write Prompt }
  begin
    i:=1;
    while i<=Prompt do begin
      putc Prompt[i];
      i:=i+1
    end
  end;

begin
  Prompt:="String Test";
  call WritePrompt;
  putc 10; { new line }

  Prompt:="Enter a string: ";
  call WritePrompt;
  call ReadStr;

  Prompt:="You wrote: ";
  call WritePrompt;
  call WriteStr;
  putc 10; { new line }
end.

You may notice I have used redundant ":" in this example.

And here is the run:

C:\AlanX\pl0>.\PLZero_Tokeniser\PLZero_Tokeniser     -i test18.pl0   -o test18.token -nl

C:\AlanX\pl0>.\PLZero_Compiler\PLZero_Compiler       -i test18.token -o test18.code  -t  test18.table

C:\AlanX\pl0>.\PLZero_Interpreter\PLZero_Interpreter -i test18.code  -o test18.run   -s  test18.stack
String Test
Enter a string: Hello World!
You wrote: Hello World!

C:\AlanX\pl0>pause
Press any key to continue . . .

I think that is pretty cool!

The next task is to write a P-Code to Subleq translator.

AlanX

PLZero Compiler

agp.cooper • 07/13/2017 at 16:12 • 0 comments

PLZero (PL/0) Compiler

The compiler is almost done. The Nikluas Wirth Pascal has been recoded in C. The structure completely rebuilt. I don't even think Mr Wirth would recognise his code! I removed the tokeniser and interpreter from the compiler and are now stand alone programs (part of the tool train).

Here are a couple of working programs.

Multiplication:

const m=7, n = 85; 
var print, x, y, z; { Assignment to print variable is printed }
 
procedure multiply; 
var a, b; 
begin 
	a := x; b := y; z := 0; 
	while b > 0  do 
	begin 
		if odd b then z := z + a; 
		a := 2 * a; b := b / 2; 
	end; 
end; 
 
begin 
	x := m;
	y := n;
	call multiply;
	print:=z;
end.

Prime number search (http://www.dustyoldcomputers.com/pdp-8/software/pl0/):

{ calculate prime numbers }
const 
  pcmax = 101;  { max on 8 is 2045 }
var
  print, { memory mapped print routine memory location }
  pc,    { prime candidate }
  d,     { divisor }
  t1,    { temp #1 }
  t2,    { temp #2 }
  isnot; { isnot a prime flag }
begin
  print := 1;
  print := 2;
  pc := 3;
  t1 := pc/2;
  while pc <= pcmax do
    begin
      { trial division to test for prime }
      d := 3;
      isnot := 0;
      { This keeps t1 >= to the sqrt(pc) }
      if (t1*t1) < pc then t1 := t1 + 1;
      while d <= t1 do
        begin
          t2 := pc / d;
          if (t2*d) = pc then { not prime }
            begin
              isnot := 1; { remember not prime }
              d := pc;    { force loop exit because we don't have break }
            end;
          d := d + 2;
        end;
      if isnot = 0 then print := pc;  { store at print will print it }
      pc := pc + 2;
    end;
end.

Wirth's test program:

CONST
  m =  7,
  n = 85;

VAR
  print, putc, x, y, z, q, r;

PROCEDURE multiply;
VAR a, b;

BEGIN
  a := x;
  b := y;
  z := 0;
  WHILE b > 0 DO BEGIN
    IF ODD b THEN z := z + a;
    a := 2 * a;
    b := b / 2
  END
END;

PROCEDURE divide;
VAR w;
BEGIN
  r := x;
  q := 0;
  w := y;
  WHILE w <= r DO w := 2 * w;
  WHILE w > y DO BEGIN
    q := 2 * q;
    w := w / 2;
    IF w <= r THEN BEGIN
      r := r - w;
      q := q + 1
    END
  END
END;

PROCEDURE gcd;
VAR f, g;
BEGIN
  f := x;
  g := y;
  WHILE f # g DO BEGIN
    IF f < g THEN g := g - f;
    IF g < f THEN f := f - g
  END;
  z := f
END;

BEGIN
  x := m;
  y := n;
  CALL multiply;
  print:=z;
  putc:=10;
  x := 25;
  y :=  3;
  CALL divide;
  print:=q;
  print:=r;
  putc:=10;
  x := 84;
  y := 36;
  CALL gcd;
  print:=z
END.

I have edited the tokeniser to recognise:

any case for token words (i.e. BEGIN, begin or Begin etc.) but variables are still case sensitive
"var" and "int" as an alternative
"<>", "!=", "#" as alternatives
"print" (lower case) is recognised at any level as the integer output.
"putc" (lower case) is recognised at any level as the char output.

I rather like the upper case for keywords.

Here is the result from Wirth's program:

Begin PL/0:
595

8
1

12
End PL/0.

I need a final check of the code, add a execution list option before moving on to the Subleq translator.

AlanX

Fixing Unary Minus

agp.cooper • 07/08/2017 at 00:48 • 1 comment

Language Structure

Programming languages can the presented/described in format called Backus-Naur Form (BNF). Here is a proper example from "Let's Build a Compiler!" Jack W. Crenshaw:

<b-expression> ::= <b-term> [<orop> <b-term>]*
<b-term> ::= <not-factor> [AND <not-factor>]*
<not-factor> ::= [NOT] <b-factor>
<b-factor> ::= <b-literal> | <b-variable> | <relation>
<relation> ::= | <expression> [<relop> <expression]
<expression> ::= <term> [<addop> <term>]*
<term> ::= <signed factor> [<mulop> factor]*
<signed factor>::= [<addop>] <factor>
<factor> ::= <integer> | <variable> | (<b-expression>)

Here is what I decoded from Simple Compiler in a "bnf" like form:

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
Term           -> Factor
               -> Factor * Factor+
               -> Factor / Factor+
Expression     -> Term
               -> + Term
               -> - Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

(Note, my decoded format is as "coded" and is upside down compared to the bnf format.)

Okay what does all this mean? Basically the minimum number of precedence levels (from "Let's Build a Compiler!" Jack W. Crenshaw) is 8:

Level	Syntax Element	Operator
0	factor	literal, variable
1	signed factor	unary minus
2	term	*, /
3	expression	+, -
4	b-factor	literal, variable, relop
5	not-factor	NOT
6	b-term	AND
7	b-expression	OR, XOR

The Simple compiler only has 4, so it was doomed from the start. Even without the bit operations it still needed 5 levels. What failed was the unary minus. This is why I had to put brackets around the "-5" the the code:

begin
  A=4;
  B=1;
  if A<B then
    write A*10<b;
  else begin
    write 1<B*10;
    write B*10, (a+b)*(-5);
  end
  C=-1;
  write C;
end

The original coder did try with "Expression -> - Term" as the C=-1 does work, but a true unary minus needs a higher precedence than "*" for "(a+b)*-5" to work.

So I need at added "SignedFactor" between "Factor" and "Term" based on Crenshaw's bnf, and remove the "Expression -> + Term" and "Expression -> - Term":

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor           
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

It took a little while to get my head around the internals of the recursive procedure calls but eventually I got it to work:

void SignedFactor(void)
{
  char op;
  
  op=tTok;
  if ((op==_plus)||(op==_minus)) {
    GetNextToken();
  }
  Factor();
  if (op==_plus) {
    // Nothing to do
  } else if (op==_minus) {
    Emit(_movBxAx);
    Emit(_xorAxAx);
    Emit(_subBx);
  }
}

In the above code:

Th global "tTok" is saved to local "op" in case it is required for later use (other procedures may "eat it"). Basically parser procedures "eat" the token if it belongs to them.
Next, if the token is for this procedure (i.e. it is a "+" or a "-") then "eat" the token by getting the next token (GetNextToken()).
Okay, the procedure has to wait for the "Factor" that will be operated on to come back (i.e. pass through to Factor()).
Factor() has come back and the result will be in "Ax". If "op" is a "+" or a "-" then do the operation (emit the code).
Okay all done, return to the calling procedure (Term()).

Here is the current language structure for Simple Compiler:

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
               -> SignedFactor % SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expression
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
 -> Expression <=...

Subleq Efficiency
agp.cooper • 07/06/2017 at 04:29 • 0 comments
A Not Very Happy Interpreter
The interpreter is not very happy. I have used the OpCode jump address without considering the Subleq actual address (i.e. the OpCode instruction size does not match the Subleq macro size). This is a bit tricky as I do not calculate Subleq instructions lengths and therefore cannot calculate the Subleq address directly (at least in OpCode2Subleq). A good fix is to precede each Subleq instruction with a label identifying the OpCode Address, eg. "_OPCxxxxx". This has no cost to the final integer code size.
The use of the OpCode instruction pointer label "_OPCxxxxx" works very well. Very easy to find the OPC code in Subleq now.
Fixed a coding error, where I coded "jz" as if it was "jeqz"? Fixed and now the interpreter executes and presents the correct answer.
Subleq Efficiency
Here is the final test of my tool train:
```
C:\AlanX\SimpleC\OpCode2Subleq>SimpleC  -o 1.opc 0<1.sc
Tokenised Code:
Line      1: BEGIN
Line      2: A = 4
Line      3: B = 1
Line      4: IF A < B THEN
Line      5: WRITE A * 10 < B
Line      6: ELSE BEGIN
Line      7: WRITE 1 < B * 10
Line      8: WRITE B * 10 , ( A + B ) * ( - 5 )
Line      9: END
Line     10: C = - 1
Line     11: WRITE C
Line     12: END
Done.

C:\AlanX\SimpleC\OpCode2Subleq>OpCode2Subleq -i 1.opc -o 1.sasm

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Asm -i 1.sasm -o 1.code -l 1.list

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Int -i 1.code -l 1.list -t 1.trace

SUBLEQ Interpreter

 1
 10 -25
 -1


Interpreter finished

C:\AlanX\SimpleC\OpCode2Subleq>pause
Press any key to continue . . .
```
Code Efficiency
12 lines of high level code became 95 words (73 OpCodes) which turned into
5530 words of Subleq code (1903 lines of uncommented code). That is about 26 lines of Subleq per OpCode. And the code takes a full 4 seconds to execute!
AlanX
Converting the Simple Compiler to a Subleq Compiler
agp.cooper • 06/24/2017 at 17:03 • 0 comments
Compiler Subleq Backend
In order to convert Simple Compiler to a Subleq compiler, I have to replace the Simple Compiler Assembler (OpCode) Lister and Assemble (OpCode) Interpreter with a Subleq OpCode exporter. I will use an external Subleq assembler and interpreter. The Subleq OpCode exporter is called the Subleq backend. The Subleq backend contains all the Subleq macros, but only composite macros (which takes an operand) are accessible. Here is the list of Subleq macros:
- jump
- clear
- not
- shl
- inc
- dec
- chs
- label
- return
- putc
- getc
- wrtInt
- rdInt
- sub
- add
- copy
- ncopy (negative copy)
- jlez
- jgez
- jeqz
- jgtz
- jltz
- jnez
- jmin
- store
- absolute
- xor
- and
- or
- nand
- nor
- new sub
- new add
And here is th list of compiler OpCode that are supported:
- system
- movAxVar
- movVarAx
- movAxImm
- movAxBx
- movBxAx
- pushAx
- pushBx
- pushFx
- popAx
- popBx
- popFx
- addBx
- subBx
- mulBx
- divBx
- jmp
- jz
- wrtAx
- wrtLn
- rdAx
- halt
- cmpAx
- orAxAx
- xorAxAx
- setEq
- setNE
- setLT
- setLE
- setGT
- setGE
The "system" command sets up Subleq with:
- HALT (0)
- OUTPUT (-1)
- INPUT (-2)
- Z (zero)
- T (temp)
- t (tmp)
- P (+1)
- N (-1)
- the CPU model:
  - Ax
  - Bx
  - Fx
  - SP
  - DP
- and few useful constants:
  - _SP (-32)
  - _DP (-16384)
  - _MIN (-32768)
  - _CR (13)
  - _LF (10)
  - _SPACE (32)
  - _MINUS (45)
  - _ZERO (48)
  - _NINE (57)
The Subleq system also has a point to pointer copy (PPC) routine (as it is assumed the monitor code will be stored in ROM and pointer to pointer copy requires self modifying code).
I will likely add many more constants to support string output later.
I have added the binary operation macros:
- XOR Ax,Bx
- AND Ax,Bx
- OR Ax,Bx
- NAND Ax,Bx
- NOR Ax,Bx
- NOT Ax
- SHL Ax
- SHR Ax (to be added)
These bit operations don't update the flag register.
Integer Minimum Value
The minimum integer value (MIN) for a 16 bit integer is -32768. Subleq has a major problem with MIN. It treats MIN as equal to 0 unless special precautions are taken. This is not usually a problem unless the code uses the sign bit (such as for bit operations etc.). In the end I rewrote all the test macros (i.e. jeqz, jnez, jlez, jltz, jgez and jgtz) to be MIN aware.
I also used MIN as the return value for multiplication overflow and attempt to divide by zero. There is a "jmin" macro if required.
Working Subleq Backend
I have got a working Subleq backend test-bed (TestMacro2Subleq), for example the following Subleq macros:
```
  compositeMacro("system",0);
  compositeMacro("rdAx",0);
  compositeMacro("pushAx",0);
  compositeMacro("pushAx",0);
  compositeMacro("rdAx",0);
  compositeMacro("movBxAx",0);
  compositeMacro("popAx",0);
  compositeMacro("pushBx",0);
  // iMul
  compositeMacro("mulBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
  // iDiv
  compositeMacro("popBx",0);
  compositeMacro("popAx",0);
  compositeMacro("divBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
  compositeMacro("movAxBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
```
Produces after assembly and interpretation:
```
C:\AlanX\TestSubleq>TestMacro2Subleq  1>Test.sasm
C:\AlanX\TestSubleq>subleq_asm -i Test.sasm -o Test.code -l Test.list
C:\AlanX\TestSubleq>Subleq_Int -i Test.code -l Test.list -t Test.trace

SUBLEQ Interpreter

6   <- Entered integer
2   <- Entered integer
12  -> Integer multiplication (=6*2)
3   -> Integer division (=6/2)
0   -> Integer remainder (=6%2)


Interpreter finished

C:\AlanX\TestSubleq>pause
Press any key to continue . . .
```
The Subleq code produced is too long to list here.
AlanX
Translated the Pascal Code to C Code
agp.cooper • 06/22/2017 at 16:18 • 0 comments
C Code
The code has been translated. Quite a bit of clean up:
- General regrouping of code.
- Fixed memory reference error (mixed up low/high byte order).
- Simplified the symbol table.
- Added code to export the code being tokenise to "stderr"
- Reworked the error reporting routine.
- Fixed up the EOF problem.
- Reordered the constants and pushed them out to a header file.
- Pushed out the assembler and interpreter code to include files.
- Changed the CPU model.
- Added some options to the command line.
The current version of the tool tranin has been uploaded.
Understanding the Code
The compiler code in most of the minimalist program codes I reviewed are very similar. Even to the point where missing instructions (i.e.">") and constant names are the same (i.e. SETLSS rather than SETLT and SETLEA instead of SETLE).
A good site for compiler construction (but not complete) is http://zserge.com/blog/cucu-part1.html. I will likely refer back to this site for when I add functions to my code.
A good pdf on compiler construction (by Jack W. Crenshaw) is http://compilers.iecc.com/crenshaw/ (I have uploaded is book compiler.pdf).
But the place to go is "Recursive-Descent Parsing" Chapter 6.6 of the AWK book pp 147-152. (I have uploaded a pdf of the book and the programs from Chapter 6).

The NewCPU Model

The CPU model for nearly all the compiler I have reviewed is:
- Ax: Accumulator register
- Bx: Secondary register
- Fx: Flag register
- SP: Stack Pointer
- DP: Data Pointer (Simple Compiler)
- 16 bit data and address word width
Usually all data is on the stack. The Simple Compiler uses a separate Data Pointer (DP) (i.e. modelling a Pascal style "heap memory"). I will keep this method.
The the assembler code therefore revolves around Ax and Bx:
- MOV Ax, Bx
- MOV Bx, Ax
- PUSH Ax
- PUSH Bx (added)
- PUSH Fx (added)
- POP Ax
- POP Bx
- POP Fx (added)
- MOV AX, IMM
- MOV AX, [ADDR] (uses DP)
- MOV [ADDR], AX (uses DP)
Two basic jumps:
- JMP [ADDR]
- JZ [ADDR]
The remainder maps the language symbols (i.e. '+', '-', '*', '/', '=',' <>','<','<=','>','>=') :
- ADD Ax, Bx
- SUB Ax, Bx
- IMUL Ax, Bx (software)
- IDIV Ax, Bx (software)
- SETEQ
- SETNE
- SETLT (added)
- SETLE (added)
- SETGT
- SETGE
The set commands set (for true) or reset (for false) the AX depending on the Flag register. True is defined as Ax=1 and false is defined as Ax=0.
The following commands set the flag register (Fx):
- OR Ax, Ax (Sets the Flags based on the Ax)
- XOR Ax, Ax (Clears the Ax and sets flags)
- CMP Ax, Bx (Sets flags based on Ax-Bx)
Finally there is a HALT and a couple of I/O commands:
- HALT (same as reset)
- wrtAx (converts an integer to characters before printing)
- wrtLn (write a new line)
- rdAx (added, reeads characters and converts to integer)
Language Construct
Languages can me modelled in Backus–Naur Form (BNF). Before extending the language I need to map the existing language. I use the syntax from the AWK handbook for the language construct:
```
// Factor         -> Identifier
//                -> Integer
//                -> ( BoolExpression )
// Term           -> Factor
//                -> Factor * Factor+
//                -> Factor / Factor+
// Expression     -> Term
//                -> + Term
//                -> - Term
//                -> Term + Term+
//                -> Term - Term+
// BoolExpression -> Expressiom
//                -> Expression = Expression
//                -> Expression <> Expression
//                -> Expression < Expression
//                -> Expression <= Expression
//                -> Expression > Expression
//                -> Expression >= Expression
// Statement      -> Begin
//                -> While
//                -> If
//                -> Write
//                -> Read
//                -> Assignment
// Begin          -> Statement    \\ This is where a statement separator (";") should added!
//                -> End
// While          -> BoolExpression Statement
// If             -> BoolExpression then Statement
//                -> BoolExpression then Statement "else" Statement
// Write          -> BoolExpression , BoolExpression+
// Read           -> Input
// Assignment     -> Identifier = BoolExpression
```
Note: the "+" at the end of a line means it can be repeated.
Its okay but I think the program should begin with a "begin" and end with an "end". The Statement procedure needs to be modified for this. There probably needs to be a semi-colon (";") as a statement separator as well. I think it might get messy without a statement delimiter later.
Bit-wise...
Read more »

View all 7 project logs

PLZero.zip A C port of Niklaus Wirth's PLZero.pas code. It is a complete rewrite and restructure. Support for arrays and strings. Support for most dialects. Includes pCode2OpCode. application/x-zip-compressed - 717.85 kB - 07/23/2017 at 02:17		Download
plzero.pas Niklaus Wirth PL/0 source code pas - 15.50 kB - 07/10/2017 at 02:25		Download
OpCode2Subleq.zip The full tool train and test data x-zip-compressed - 76.57 kB - 07/06/2017 at 09:35		Download
runSimpleC.zip The Simple Compiler and test data x-zip-compressed - 14.33 kB - 07/06/2017 at 09:34		Download
bnf.txt The SC language discription plain - 1.24 kB - 06/24/2017 at 15:31		Download

Simple Compiler

Become a Hackaday.io member

Just one more thing

Description

Details

Mission Completed

Speed

Code Size

Improvements

Conclusion

PL/0

Tokeniser

String Tokens

PL/0 in C

PL/0 Extensions

P-Code to Subleq

Files

Project Logs Collapse

Code Generation

The Plan

PL/0 Completed?

Next

PLZero (PL/0) Compiler

Language Structure

A Not Very Happy Interpreter

Subleq Efficiency

Code Efficiency

Compiler Subleq Backend

Integer Minimum Value

Working Subleq Backend

AlanX

C Code

Understanding the Code

The NewCPU Model

Language Construct

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse