AltairX is a VLIW In Order CPU.
It has 3 internal memory:
64 KiB L1 data Scratchpad memory.
128 KiB L1 instruction Scratchpad memory.
32 KiB L1 data Cache 4-way.
The processor has no branch prediction, it will be based on the delay slot (1 cycle for Fetch) and 1 decode cycle + Jump (Delay)
The number of instructions is done via a "Pairing" bit, when it is equal to 1, there is another instruction to be executed in parallel, 0 indicates the end of the bundle.
The goal of this processor is to reach the minimum of latency, and to solve the problem of latency of the RAM. For this, the compiler will have to do two things:
- resolve pipeline conflicts
- preload the data in advance with a DMA
This is a technique used on consoles like the Playstation 2 and 3, we have to make a double buffer, and therefore execute and read our data in buffer 1, while we preload our data in buffer 2.
Then we execute the buffer 2 and we preload the buffer 1 and so on.
To resolve pipeline conflicts, it has an accumulator internal to the ALU and to the VFPU which is register 61. To avoid multiple writes to registers due to unsynchronized pipeline, there are two special registers P and Q (Product and Quotient) which are registers 62 and 63, to handle mul / div / sqrt etc etc.
It also has the uncached accelerated to speed up reads only (cache miss lasts half the time).
For floating point numbers in AltairX , it will not be 100% compatible with the standard with IEEE 754 :
-Non-normalized numbers are not handled (they are equal to zero).
-Infinite numbers cannot be handled (they are worth the max value).
-NaN values are not manage (they are worth the max value).
-Rounding is always towards 0
-Exceptions are not handled
There is a VM and an assembler (I use VASM).
So I have to finish the LLVM back end to have a C/C++ compiler.
And improve my VM too.
Of course, all this will have to be tested via an FPGA.