Once again, the compiler is 100% homemade.
It is a cross compiler that runs on PC, that takes A2Z basic source at its input, and that generates ASM code at its output. Making a native compiler (that could run on A2Z itself) would have been too difficult, and my computer lacks too many things (tables more than 64kB, recursive functions, string functions, etc…).
The compiler is, for me, the most difficult part of this project. I am more used to code low level control-command on small microcontrollers, than to code such applications that manipulate string, lists, etc…
I needed lot of thinking and organization before beginning to code the compiler.
But I really enjoyed coding this compiler, because it is a new matter for me.
Sequencing of the compiler
1) Storage of lines :
the source code is split into a table of lines. Every line means a new instruction
2) Configuration :
the compiler then looks for the “configure” directive. This directive enables to configure max size, and memory position allocated to executable code and variables.
3) Functions, first pass:
the compiler then browses the code, and looks for functions declarations (subfunctions, ASM functions, and macros). It looks for first line of the function, and last line of the function “endfunct”. Each line of the code is tagged with its membership to a function. The first function must be “main”, and is executed at first.
4) #define management :
the #define are substituted with their real values.
5) declarations of variables:
the source code is once again browsed, and the compiler looks for variable declaration. For each variable declaration, the compiler stores its name, its function membership, its type, and the 2 sizes (for tables). If a variable is declared outside of a function, it is considered “global” (function = 0). We check that there is no double declaration (variable with the same name inside a single function), and that the RAM allocated for variables is enough.
6) Allocation of variables :
We then allocate the variables to the RAM, in a fixed (static) manner. The code for initial value of variables is also generated at this step.
7) Functions second pass :
for each function declaration (at the first line of a function), we look for inputs and outputs of the function. One function can have several inputs, and several outputs. The variables used for inputs and outputs must belong to the function.
8) Instruction decoding:
It is the biggest part of the compiler, and the most complex. This is the only part that generates executable code. I use the concept of “expression” : a value that can be computed with several things, that can be : variables, constant, register, cache, result of a mathematical or arithmetic expression. The expression analysis is of course recursive inside the compiler, this enable to code complex formulas in a single line. The result of this expression can be used for 3 things : assign the value to a variable/cache/register, use it as a parameter for an input of a function, or use it for an input of a condition (if/while/for).
The compiler browses the source code from beginning to the end, looks for instructions, and generates executable code on the fly. We also count the code size and determine the address of each instruction inside the executable code. This is necessary for the branching that will be managed later.
There are 4 types of instructions:
Read more »
- 8.1 : assembly code
- 8.2 : assign. It is the most simple instruction, that assigns a variable, cache, or a register, with an expression.
- 8.3 : branching (conditional or unconditional) : if/else/for/while/goto
- 8.4 : function call are managed in 3 steps:
- 8.4.1 : first we copy the content of the inputs of the functions from the calling function to the called function.
- 8.4.2: then we call the function. We also manage the execution stack at this step. It is 100% software, there is no hardware stack.
- 8.4.3 : last, we copy the outputs of the...