3.3) A2Z Compiler

A project log for A2Z Computer

A computer invented, designed and built from scratch. Custom CPU on FPGA.

f4hdkf4hdk 11/13/2016 at 18:160 Comments

Once again, the compiler is 100% homemade.

It is a cross compiler that runs on PC, that takes A2Z basic source at its input, and that generates ASM code at its output. Making a native compiler (that could run on A2Z itself) would have been too difficult, and my computer lacks too many things (tables more than 64kB, recursive functions, string functions, etc…).

The compiler is, for me, the most difficult part of this project. I am more used to code low level control-command on small microcontrollers, than to code such applications that manipulate string, lists, etc…

I needed lot of thinking and organization before beginning to code the compiler.

But I really enjoyed coding this compiler, because it is a new matter for me.

Sequencing of the compiler

1) Storage of lines :

the source code is split into a table of lines. Every line means a new instruction

2) Configuration :

the compiler then looks for the “configure” directive. This directive enables to configure max size, and memory position allocated to executable code and variables.

3) Functions, first pass:

the compiler then browses the code, and looks for functions declarations (subfunctions, ASM functions, and macros). It looks for first line of the function, and last line of the function “endfunct”. Each line of the code is tagged with its membership to a function. The first function must be “main”, and is executed at first.

4) #define management :

the #define are substituted with their real values.

5) declarations of variables:

the source code is once again browsed, and the compiler looks for variable declaration. For each variable declaration, the compiler stores its name, its function membership, its type, and the 2 sizes (for tables). If a variable is declared outside of a function, it is considered “global” (function = 0). We check that there is no double declaration (variable with the same name inside a single function), and that the RAM allocated for variables is enough.

6) Allocation of variables :

We then allocate the variables to the RAM, in a fixed (static) manner. The code for initial value of variables is also generated at this step.

7) Functions second pass :

for each function declaration (at the first line of a function), we look for inputs and outputs of the function. One function can have several inputs, and several outputs. The variables used for inputs and outputs must belong to the function.

8) Instruction decoding:

It is the biggest part of the compiler, and the most complex. This is the only part that generates executable code. I use the concept of “expression” : a value that can be computed with several things, that can be : variables, constant, register, cache, result of a mathematical or arithmetic expression. The expression analysis is of course recursive inside the compiler, this enable to code complex formulas in a single line. The result of this expression can be used for 3 things : assign the value to a variable/cache/register, use it as a parameter for an input of a function, or use it for an input of a condition (if/while/for).
The compiler browses the source code from beginning to the end, looks for instructions, and generates executable code on the fly. We also count the code size and determine the address of each instruction inside the executable code. This is necessary for the branching that will be managed later.
There are 4 types of instructions:

At this step, all branching instruction do not contain their destination address, because it can be unknown, if the destination is after the instructions already parsed.

9) Branching management :

then all the branch are treated: goto/for/next/if/while/function call/etc… we must give the real address on the executable code to which the branch refer. For this, we use the position of each basic instruction determined in previous step.

10) ASM generation :

finally, the compiler generates the ASM file, which is a concatenation of 2 things : the executable code, and the initial value of variables.