I previously walked through and LDI instruction, this time we will look at how the stages of an Ember ALU instruction operate...from a high level at least. This implementation, while far from finished, does run at 100MHz currently, however, looking at the timing analysis in Vivado, some of the paths are getting quite close to the 10ns limit between clock cycles. The Z and N flag operations are the last to complete, since they rely on the result of the ALU op, so I'll have to take a look at those to see if I can change the logic around to get better inference.
For now, let's look at the SUB instruction in the simulation timeline view.
As with any instruction, the first thing that happens in the pc_fetch stage is that curAddress (and thus address_out) is updated with the value of nextAddress, which was set in the retire stage of the previous instruction. In this case, the address to fetch is 0x00000048.
In the decode stage, we see that a sub.b instruction has been fetched by looking at the op.opCode and op.width values. Because of the b modifier, the width of the operation is 8-bit unsigned (zero-extended) byte. This means that both the input values and output of this instruction will be masked and then zero-extended. In addition, the processor ALU flags will be set based on the value of the 8-bit result.
If we examine the rest of the decoded instruction, we see that op.regSrcA is register r2, which currently has the value of 0x00000001. Also, op.immFlag is set, so the immediate value 0x01 contained in op.immVal is used for the second operand. This describes the following instruction in mnemonic format:
sub.b r2, r2, #1
The equivalent C would be:
uint32_t r2 = 1; r2 = (uint32_t)((uint8_t)r2 - (uint8_t)0x01);
In the execute stage of an ALU instruction, the CPU will latch the result registers to the values in the output. These include aluResult, which now has the value 0x00000000, as well as the ALU flags overflow, negative, and carry, which are all unset except zero, which is set since the output value is 0.
You might also note that after the execute stage, the address_out bus is released (represented by ZZZZZZZZ, or high impedance), since the value is data_in is no longer needed. In a real system, this would disable the memory read line on the CPU to allow other devices on the system bus to access memory if needed.
Finally, the retire stage is where the values of the flags and aluResult are written to the CPU registers, and we also update the nextAddress again to point to the next instruction.
Also notice that a bunch of values in the timeline change at this point, like the flags and operands, but we don't care since these only matter at the time they are latched into registers. Since they are always wired to the data_in register no matter what value is there, they become basically undefined when the address is not being driven by the CPU directly.
That covers the ALU instruction from the timeline view. I'm working hard on an ISA document, which I will post soon, and should make much of this more clear, and open for discussion.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.