Close

Contemplation of multi-word data transfers inside ECM-16/TTL computer

A project log for ECM-16/TTL homebrew computer

16 bit Computer made from ttl logic chips

pavelPavel 12/23/2021 at 19:162 Comments

This is an expansion over addressing logic structure described previously.

I think there is a need for providing a means to transfer blocks of data between registers and between registers and memory in a single instruction. This should increase execution speeds, especially where data-intensive calculations take place, tight cycles over arrays, and frequent procedure calls, where much of the data needs to be transferred from registers to stack and back. It also will somewhat decrease program size.

Execution speedup will be achieved with faster memory access operations: as the memory access operation takes several cycles to execute, mainly due to steps leading to address calculation, transferring several additional words will lengthen the operation execution by the number of cycles equal to the size of block in words. For example, if one-word transfer takes 4 cycles to execute, then two-word transfer will only take 5 cycles (instead of 8 cycles for 2 one-word transfers). Only penalty is 1-cycle prefix instruction for setting flag for number of words in transfer.

Overall, execution cycles could be saved when using these multi-word data transfers. There are several types of memory access instructions depending on the address source, and they can take different number of cycles to execute in default mode (1-word accesses, can take 3, 4 or 5 cycles). But even in the least favourable case, when the access is relatively fast (only 3 cycles), and the need is only for 2 consecutive words at a time, this kind of multi-word transfer still makes sense as it saves 1 cycle (5 cycles for prefixed instruction vs 6 cycles for 2 consecutive 3-cycle instructions). For all other cases, savings rapidly increase with block length, and with instructions that intrinsically take more time to execute.

As for the program size, prefix will add 1 word, so for 1-word memory access instructions, there will be no difference, code size wise, as to have single double-word access, or two single-word access instructions. For two-word instructions, and for block sizes of 4 words or more, the code size savings become apparent.

Technicalities

How to implement such transfers?

In the Memory Address Unit, the output from MAB register is to be routed back to Address Adder via 2-to-1 mux (S8 on scheme above). By default (mux control: 0), Memory Pointer is provided to "Base" input of the adder. When multi-word transfer is executed, after transferring the first word, the mux control becomes 1, and MAB now is "Base" input, while "offset" is set to "+2".

In the Control Unit there should be presettable counter which is set directly by prefix instruction. There also should be additional circuitry for overriding/modifying register address in instruction so as to transfer each word in the block to/from different register. For simplicity, some restrictions could be imposed: the lower bits of register address will need to be set to zero (or be ignored), so the instructions for double-word transfers could only have even reg addresses, ones for four-word transfers will address reg 0 and 4, and so on. Otherwise an additional small adder will be needed, and it will introduce some additional delay.

The prefix would be a one-off, meaning any flags set by it will be cleared after execution of prefixed instruction. It would need to be inserted before each transfer instruction to make it multi-word, and also there will be no need for special clearing instruction afterwards, if single-word transfer is needed. Thus these flags are not to be saved, and will not appear inside Status register. 

Flags:

"2" -- transfer of two words to/from a pair of registers
"4" -- transfer of four words to/from a half of Register File or half of MemPointer File
"8" -- transfer of eight words to/from whole Register File or MemPointer File
"16" -- transfer of 16 words to/from all registers at once

The instruction loading all the registers at once will have an effect of almost total context switch, as all memory pointers together with all GPR will receive new data, so, the PC, SP, FP and BP will have new values, and execution can start from other part of memory, and stack-related pointers will point to other stack. It can be used as return from call.

Regarding transfers of single byte: this can be done with some modification of memory system, and will introduce some delay, but not that much. These will be like single-word transfers, but will take one cycle longer. For byte transfer, 8 least significant bytes will be used. I am still in doubt if this is really needed. It will consume 1 bit of address offset, making the maximum offset half as big -- it will be +-32767 bytes instead of +-32767 words. How significant is this? May turn out not significant at all, as most offsets to be used are much closer.

Discussions

zpekic wrote 12/23/2021 at 22:48 point

Few questions (sorry if missed some design detail that may invalidate them): 

- if all regs have same capabilities, why do they need to move content between them internally? It can be useful, but seems like it complicates otherwise elegant design

- why prefix (complicating fetch / decode sequence, another read added) and instead just implement instructions similar to LDIR, CPIR, INIR, OTIR etc. (as in Z80) - these allow optimized memory access (1 read or write per cycle) with less complexity. 

  Are you sure? yes | no

Pavel wrote 12/24/2021 at 08:05 point

The registers are not totally the same. The ones in GPR group are directly connected to Main ALU, and all the ALU ops can be performed on them. The other group (Memory pointers, in Addressing Unit) cannot be used with that ALU, as they serve a different purpose. The MOV ops do not use special circuitry aside from the little part of Control Unit (for now still WIP).

Prefix is needed because I ran out of bits in main instruction: there quite a few different memory access modes that I devised, and all the bits in instruction were utilised. Adding capability for transferring several words in one instruction was an afterthought. I didn't want to let go of part of these access modes, so I decided to make this an addon. I.e. If I need only one word loaded or stored, no prefix needed. But when there is a need to transfer several adjacent words, this prefix makes it possible to do it faster.

And, lastly, the design of this CPU is quite different from z80, as far as I can tell, and it is likely that direct comparison of their instructions may not be straightforward.

  Are you sure? yes | no