Instructions are, as previously discussed, stored in a packed format. They are unpacked by the instruction fetcher unit and stored in a FIFO for the execution unit to use. A fetch operation may produce up to 2 instructions per fetch cycle (a fetch cycle takes two master clock cycles, as the memory they are fetching from is slower than the master clock; there are 2 instruction fetch units that operate on different instruction memory units and fill separate FIFOs, thus allowing for a total of 2 instructions per master clock cycle to be fetched), but it may also take multiple fetch cycles to produce a single instruction. Hopefully, this will even out to 1 instruction per cycle over time.
To keep things simple, the instruction fetcher is not directly connected to the register file (otherwise, we'd need some kind of arbitration over which unit gets to use the register file in which circumstances, which could easily get messy). We therefore need to arrange for methods to get the program counter into and out of the register memory indirectly via the execution unit (which is the only direct connection). The program counter realistically only needs to be stored in the register memory when the thread is not actually executing; while it is executing it can be cached in the decoder unit. The decoder needs to be able to inform the execution unit of the address of any instruction where it may stop, and the execution unit must be able to pass a new address when necessary. We therefore arrange for a bus between the two units to allow for this.
We add two instructions that are not visible in the ISA but which are interpreted by the execution unit; the instruction decoder can use these to cause program counter transfers:
- RESTOREPC causes the execution unit to load the current PC from register memory and execute a jump to it; note that this is the same behaviour as a typical JMP instruction, except that a different operand mode is required
- SUSPEND causes the execution unit to capture the current PC from the fetch unit and store it in register memory; it also signals to the instruction fetcher that it is ready to start handling instructions for a new thread
There are also no provisions for removing instructions from the queue if they turn out to be unnecessary. We therefore arrange for the fetcher to stop fetching if it finds an instruction that could cause a jump or suspension of a thread. It only resumes after the execution unit tells it what happened.
The PULL instruction requires special handling:
- The first time it is executed in any specific task invocation, it will immediately return the operand that was placed into the task FIFO.
- On subsequent executions, it will either return an additional operand (if the next entry in the task FIFO is also for the same task) or suspend the task.
The instruction fetcher therefore supplies a flag to the decoder unit that allows it to substitute a SUSPEND instruction for a PULL instruction in the latter case. YIELD and PUT instructions may also suspend the thread, but the decision to do so is deferred until execution, so are not replaced with SUSPEND instructions. YIELD, on the other hand, is a special case of PUT, so doesn't need a specific instruction.
This means that the instructions that are required to be supported by the execution unit are as follows:
|00||SUSPEND||Store current PC in register memory|
|01||PULL||Retrieve task operand and store in destination register|
|02||JMP||Pass new PC to instruction fetcher|
|03||PUT||Send a value to a given destination|
|10||XCHG||Fetch 16 bits of register memory, byte swap, and save back to original source|
|11||MOV8||8 bit move from immediate operand (includes SCSB instruction)|
|14||LDB||Load byte from memory (includes XLAT)|
|15||LDBI||Load byte from memory and postincrement address|
|16||STB||Store byte in memory|
|17||STBI||Store byte in memory and postincrement address|
|19||DLDI||DMA load and postincrement address|
|1B||EXT||Shift and extract bits|
|1C||IFREG||Conditionally execute next instruction based on tests against registers|
|1D||IFSTAT||Conditionally execute next instruction based on channel status|
|1E||SXA||Shift and add|
|1F||START||Set up new channel and begin execution|
FIXME - optimize numeric allocations to minimize required logic
Operands are encoded using a mode and then several bits of data. 16-bit register access (used for simultaneous access to A and B registers) and 12-bit (for pointer registers) use the same encoding. ALU operations are 4 bits, and the same field may alternatively be used as a shift direction indicator/counter, or a condition code for IFxxx instructions. Not all fields are used by all instructions. Registers in fields labelled as "source" are preloaded before the execution pipeline stage (only one field per instruction is automatically preloaded; additional loads must be requested in the instruction microcode, which will make it take more than the standard 1 execute cycle).
|0||Single source/target register (8 bit); ALU op; immediate 8|
|1||Single source/target register (16 bit); ALU op; immediate 8|
|2||Target register (16 bit); Source register (16 bit); ALU op|
|3||8 bit register; 16 bit source register; immediate 6|
|4||8 bit source register; 16 bit register; ALU op|
|5||Single source/target register (8 bit); immediate 8|
|6||Single source/target register (16 bit); immediate 8|
|7||PC flag; shift; immediate 8|
So opcode requires 5 bits, operand type tag 3 bits, and maximum number of bits required in operands themselves is 16 bits (code 0). The instruction FIFO is therefore 24 bits wide.