Close

Pipeline (2)

A project log for 100MHz TTL 6502

Experimental project to break the 100MHz “sound barrier” on a TTL CPU

drassDrass 11/12/2020 at 03:250 Comments

I wanted to touch on a final aspect of this pipeline design and the specific problems it helps to overcome. I struggled a bit with this explanation, so apologies in advance for any confusion. I’ll be happy to try to clarify so please just ask. Here it goes ...

Unlike the atomic instructions we find in a traditional RISC pipeline, even basic operations in this design are spread over two microinstructions. For example, a typical RISC style add operation, like this:

add r1, r2, r3

 takes two microinstructions to specify in this pipeline, corresponding to the 6502 FetchOperand and FetchOpcode bus cycles, like this:

ALUin(A, DB, C); PC += 1                # FetchOperand
A := ALUop(ADD); IR := DB; PC += 1      # FetchOpcode

 The first microinstruction loads the inputs of the ALU and the second performs the ALU operation itself. To be clear, the RISC form of the instruction would execute the same sequence of steps with respect to the ALU as it works down the pipeline. But it remains an atomic unit that describes only a single operation.


By contrast, a single microinstruction in this design can specify the ALUop for one operation and the ALU inputs for the next. For example, during indexing, we might see the following microinstruction:

ADL := ALUop(ADD); ALUin(ADH, 0, Cout)

 This microinstruction completes the sum for the low-byte of the address (ADL) and also sets up the ALU inputs to adjust the high-byte (ADH). Because of their dual-function, only one of these microinstructions is required to manage the activity across both the DECODE and EXECUTE stages.

D72D8FF6-B463-4F19-8A4F-42F9AD9BCFA1.jpeg

As a bonus, that one microinstruction can also specify whether any ALU outputs need to be recirculated (as is the case with Cout in the microinstruction above).

Most importantly, though, the arrangement allows the pipeline to avoid control stalls. To see how, consider the effect of a FetchOpcode operation on the pipeline. A FetchOpcode causes microinstructions for a new opcode to begin to be fetched from a new location. In that sense, we can think of the opcode as an address and of FetchOpcode microinstructions as unconditional branches to opcode "subroutines". From the point of view of the microinstruction stream, a FetchOpcode is in fact an unconditional branch.


0AF13418-5175-4786-9C77-AC7FA5652D4B.jpeg
And just like all pipelined branches, FetchOpcode invalidates any instructions that have already been pre-fetched into the pipeline at the time it executes. This is effectively a "branch delay slot". With a traditional pipeline, there are two such invalid instructions, one in the FETCH stage and another in the DECODE stage. In this case we can use the opcode itself in place of the microinstruction in the FETCH stage. But the microinstruction in the DECODE stage has to be discarded. Left as is, the pipeline would stall.


7F61A5E2-7825-4C8E-A82A-DAAF19009B02.jpeg
This is where the "split" microinstructions come in handy. Since there is only one microinstruction active for both DECODE and EXECUTE, we can take the opcode and just keep going. No control stall is triggered and no extra cycles added to the processing as a result.

Alright, with that final gremlin banished, here now is a high-level block diagram for this CPU.


BDE32586-357C-494B-BDF1-F5C63F430764.png

Discussions