Close

Code Generation

A project log for Simple Compiler

A very simple compiler for minimalist home brew CPUs

agpcooperagp.cooper 07/20/2017 at 11:060 Comments

Code Generation

I have been testing some fairly big programs. Another grammar fault I found is that you cannot assign negative constants. Fixed.

Here is the grammar at the moment:

  PL/0 Grammar:
  program = block "." .
  block = [ "const" ident "=" ["-"] number {"," ident "=" number} ";"]        // Add negative constants
          [ "var" ident {"," ident} ";"]
          [ "var" ident "[" number "]" {"," ident "[" number "]"} ";"]        // Add arrays
          { "procedure" ident ";" block ";" } statement.
  statement = [ ident ":=" expression
                ident "[" expression "]" ":=" expression                      // Add arrays
                ident ":=" string                                             // Add string assignmnet to ident
                ident "[" expression "]" ":=" string                          // Add string assignmnet to arrays
              | "call" ident
              | "?" ident | "write" ident | "putc" ident                      // Add "write" (int) and "putc" (char)
              | "!" expression | "read" expression | "getc" expression        // Add "read" (int) and "getc" (char)
              | "begin" statement {";" statement } [";"] "end"                // Add optional ";" before "end"
              | "if" condition "then" statement [ "else" statment ]           // Add "else"
              | "while" condition "do" statement ].
  condition = "odd" expression |
              expression ("="|"#"||"!="||"<>"|"<"|"<="|">"|">=") expression . // Add "!+" and "<>"
  expression = [ "+"|"-"] term { ("+"|"-") term}.
  term = factor {("*"|"/"|"%"|"mod") factor}.                                 // Add "%" and "mod"
  factor = ident | ident "[" expression "]" | number | "(" expression ")".    // Add arrays
Add comments (to tokeniser):
  // ... EOL
  { ... }
  /* ... */
  (* ... *)

Still have not fixed the unary minus fault but its not really an issue, just put parentheses around it.

The above grammar may make you eyes roll but you need it to code a compiler.

The Plan

The plan for building the code generator is to rework the interpreter:

  1. Change the stack to downward growth from upward growth (done).
  2. Swap out the P-Code instructions for pseudo CPU OpCode (done).
  3. Convert the interpreter into a OpCode lister (done).
  4. Covert the opcodes to Subleq.

The reason for doing this to the interpreter is that changes can be incrementally checked.

The CPU I am modelling is as minimal as practical. The current pseudo OpCodes are stack focused. Here are the registers:

Registers Comment
Ax Primary Register
Bc Secondary Register
Tx Temporary Register (used for swap)
PC Program Counter (?)
SP Stack Pointer (downward growing)
BP Base Pointer

The program counter (PC) is not actually a register but a subroutine return address. It is set by the compiler at compile time, that is the real PC register is not actually accessed during execution.

The stack and register pseudo opcodes:

Stack operations: Register moves:
FromStack (Ax=stack[Ax]mov Ax,Bx
ToStack (stack[Ax]=Bx)mov Ax,SP
push Ax mov SP,Ax
pop Ax mov Ax,BP
pop Bx mov BP,Ax
  mov PC,Ax
Move Immediate (literal): mov Ax,PC
mov Bx,Imm mov Tx,Ax
  mov Bx,Tx

I have favoured register moves over stack operations because stage operations are expensive. Here are the remaining pseudo opcodes:

Increment/Decrement: Mathematical Operations: Boolean Operations:
dec Ax sub Ax,Bx odd Ax
inc Ax add Ax,Bx eq Ax,Bx
  mul Ax,Bx lt Ax,Bx
Jump: div Ax,Bx le Ax,Bx
jmp mod Ax,Bx ne Ax,Bx
jz   gt Ax,Bx
    ge Ax,Bx

Add four input/outputs instructions:

Input/Output:Code:
read (integer)read
get (char)getc
write (integer)write
put (char)putc

Recoded the pseudo opcode interpreter into a code lister (done). Called it PCode2OpCode.

I have been reviewing the OpCode output and I have used indirect addressing with the Ax register which of course is not a valid instruction for the x86 instruction set. It is pseudo opcodes after all. I also changed the format to make it easier to import, for example I use "mov_Ax,Bx" rather than mov Ax, Bx" and I dropped the "L" prefix for address labels.

AlanX

Discussions