Close

Virtual CPU 2

A project log for YATAC78 - The WWW TTL Computer

Retro computer built from 1978-era TTL logic chips. Internet capable with built in web browser and server

Alastair HewittAlastair Hewitt 05/28/2019 at 15:524 Comments

An initial sketch of the interpreter code has been completed. This was the first time a real program was created using the native machine code. This process exposed some limitations and an optimization in the current design. Changes have been made to the schematic to reflect the following changes:

  1. The accumulator is always loaded after an ALU function.
  2. The program counter replaces the accumulator in the set of 8 register targets.
  3. The dual 4-bit buffer is eliminated from the ECU (-1 chip).
  4. Additional logic added to support banked RAM (+1 chip).

The hope was to produce something that runs at close to the native speed of the emulated CPU. This will not be possible though. Interpreters are not very efficient and the final implementation will probably operate at around 1/4 of the emulated CPU speed. However, the native machine code can be used to add efficient system calls for accessing and controlling the peripherals (audio/video/serial).

The interpreter uses the zero page to store the virtual CPU registers. These include things like a virtual program counter and stack pointer. Many of these virtual registers are 16 bits and need to be loaded, incremented or decremented, then saved back to the zero page. Additional conditional checking is required to determine if the most significant byte needs to change when the least significant is updated.

Once the program counter is updated the instruction it points to can be read. This will then drive a switch statement to select the code that implements the instruction. There are various ways to make this switch. The most efficient is to use the opcode as an offset to the native program counter. This was the rationale behind the first change above. A custom ALU function can be added to define this offset, but even then, there isn't enough space in a single page to implement all the instruction emulation code.

The current design will use three jumps to select the instruction code. The first will jump within the page to one of several fork points. Each fork then jumps to a new page that branches within that page to specific code that implements the instruction. There is one additional page jump at the end to return to the start of the interpreter loop. The first jump could define up to 64 pages, each of which could contain code for 16 instructions each. This would provide room to support 1024 op codes.

The total overhead for just this instruction decode is around 30 process cycles. The actual instruction implementation would probably require a similar amount of cycles to complete. A total of around 60 cycles per instruction translates to around 0.125 MIPS. This is about 1/4 of an original 1 MHz 68xx processor that could perform around 0.425 MIPS.

The last change listed above is aimed at supporting FUZIX.  This is designed for 8-bit CPUs, but requires more than just 64k of RAM. More memory requires Banked RAM to switch between different address spaces.  This can be achieved on the YATAC by using the extended register to define additional address bits for the RAM. Two bits are used to support four address spaces, with the GPU automatically switching to the highest bank to access the display RAM.

Discussions

roelh wrote 06/04/2019 at 07:47 point

Hi Alastair,

too bad that your virtual CPU has such a low speed, especially now your hardware is so fast (30 MHz).

If I were in your shoes, I would modify the hardware to get more speed for your virtual cpu. Since no pcb has been built yet, this can still be done.

(The Kobold CPU as published was the 6th version of the design, and there were a great number of designs before that which never saw the hack-a-day-light). 

Suppose you had the virtual cpu in a incrementable hardware register. A single memory access could fetch your 8-bit instruction, and put it in the upper byte of the PC while clearing the lower byte and incrementing the virtual pc. This is only 1 cycle were you previously needed 30 ! It would give you 256 emulated instructions with max. 256 cycles each. For the data actions of the cpu, having a hardware accumulator would probably help, but more advice is difficult since I do not know your current virtual design.

Another approach would be to put a RAM 'in parallel' to the current ROM, and fetch your instructions there, forgetting the virtual cpu. The hardware cpu might need some more instructions in that case.

  Are you sure? yes | no

Alastair Hewitt wrote 06/04/2019 at 12:19 point

Hi Roelh,

The virtual CPU is only part of the story. It is quite slow but will have access to a lot of accelerated system calls, so it's not as bad as it sounds. In a way the virtual CPU is just executing the "business logic" of the machine.

Emulating an existing CPU is very important. I wouldn't want to retarget assemblers and compiler on top of everything else. Attempting to improve the performance by adding more chips sounds like a slippery slope to implementing the entire emulated CPU. This can be done, but takes about 200 chips (https://c74project.com)

The current CPU uses 26 TTL chips, so I'm going to pay for it somewhere :) Another constraint of this minimalistic design is to only use a single ROM and RAM chip. This requires multiplexing everything on the same memories, but it would be easier/faster if I broke things down to dedicated memories. The multiplexing divides down the relatively fast underlying hardware clock, which is ultimately constrained by the 55 ns access time.

The "GPU" should be where things will shine though. Supporting a native text mode means you only need to update one byte to change an entire character on the screen (128 pixels).

  Are you sure? yes | no

Alastair Hewitt wrote 05/30/2019 at 02:57 point

I'll definitely have room for more than one vCPU, but even one is going to be a lot of work! The Z80 is a lot more complex which is why I'm leaning towards the 68XX/6502 side. The deeper register set of the Z80 doesn't add any additional cost to the interpreter though. Things seem to change every day as I progress, so it's anyone's guess where I end up.

  Are you sure? yes | no

Marcel van Kervinck wrote 05/29/2019 at 16:59 point

In the Gigatron's vCPU I wish I had discovered the "jump twice" trick a bit earlier. Still it does have a couple of vCPU instructions that escape to another page. I 'm thinking about patching 1 instruction living in the primary page to allow the addition of 2 or 3 new ones. But that's about the maximum leeway we have, other than adding another vCPU. The idea of targeting an existing ISA has crossed my mind more than once. There is great value in connecting with existing standards early on. And you can have multiple interpreters in one system. Running 6502 and Z80 code at the same time?

  Are you sure? yes | no