An initial sketch of the interpreter code has been completed. This was the first time a real program was created using the native machine code. This process exposed some limitations and an optimization in the current design. Changes have been made to the schematic to reflect the following changes:
- The accumulator is always loaded after an ALU function.
- The program counter replaces the accumulator in the set of 8 register targets.
- The dual 4-bit buffer is eliminated from the ECU (-1 chip).
- Additional logic added to support banked RAM (+1 chip).
The hope was to produce something that runs at close to the native speed of the emulated CPU. This will not be possible though. Interpreters are not very efficient and the final implementation will probably operate at around 1/4 of the emulated CPU speed. However, the native machine code can be used to add efficient system calls for accessing and controlling the peripherals (audio/video/serial).
The interpreter uses the zero page to store the virtual CPU registers. These include things like a virtual program counter and stack pointer. Many of these virtual registers are 16 bits and need to be loaded, incremented or decremented, then saved back to the zero page. Additional conditional checking is required to determine if the most significant byte needs to change when the least significant is updated.
Once the program counter is updated the instruction it points to can be read. This will then drive a switch statement to select the code that implements the instruction. There are various ways to make this switch. The most efficient is to use the opcode as an offset to the native program counter. This was the rationale behind the first change above. A custom ALU function can be added to define this offset, but even then, there isn't enough space in a single page to implement all the instruction emulation code.
The current design will use three jumps to select the instruction code. The first will jump within the page to one of several fork points. Each fork then jumps to a new page that branches within that page to specific code that implements the instruction. There is one additional page jump at the end to return to the start of the interpreter loop. The first jump could define up to 64 pages, each of which could contain code for 16 instructions each. This would provide room to support 1024 op codes.
The total overhead for just this instruction decode is around 30 process cycles. The actual instruction implementation would probably require a similar amount of cycles to complete. A total of around 60 cycles per instruction translates to around 0.125 MIPS. This is about 1/4 of an original 1 MHz 68xx processor that could perform around 0.425 MIPS.
The last change listed above is aimed at supporting FUZIX. This is designed for 8-bit CPUs, but requires more than just 64k of RAM. More memory requires Banked RAM to switch between different address spaces. This can be achieved on the YATAC by using the extended register to define additional address bits for the RAM. Two bits are used to support four address spaces, with the GPU automatically switching to the highest bank to access the display RAM.