Tagged registers

A project log for F-CPU

The Freedom CPU project has a log here too now :-)

Yann Guidon / YGDESYann Guidon / YGDES 06/16/2021 at 23:260 Comments


That log might make you roll your eyeballs so much that you"ll see yourself cringing but it might also "solve" an old problem, in the sense that "solving" in engineerspeak means "moving the problem somewhere else". Hopefully it doesn't come back later to haunt you even more from its new residence.

Tagged registers were not present as such in FC0 but it contained "shadow" values, such as the zero flag. The LSB and MSB values are directly available in the register set (or a shadow copy somewhere else), so they could be used for conditional instructions. The Zero flag can be recomputed any time the register is written, either after a computation or context switch. This shadow mechanism is simple, cheap, non intrusive... Another register attribute bit would have contained the carry flag, but it was not considered "good" and FC0 preferred 2R2W instructions instead. It was still "fair game" though.

Now I'm thinking of something potentially much weirder than that, and I can already hear some of your sarcasm, of the kind "Why don't you go full iAPX32 while you're at it ?". I'll simply answer : "Because that's not the purpose", despite some temptations that are quickly swatted by the teachings of Computer Development History.

So let's consider our "Tagged Registers". To be honest, FC1 already has some "shadow flags" for some registers to help reduce checking bounds all the time, for example, when a pointer is stored, like when copying a valid address register to a common register : this would also propagate its TLB status as well, to prevent excessive TLB lookups and save some energy and time. This helps speed up jumps or passing parameters through function calls, for example. This again, this is "non intrusive" and hidden from the ISA, a mere convenience. If the shadow flag is invalid, you lost a few cycles and that's all.

But the ISA faces a growth crisis. The 2R1W instruction format already saves a little room by dropping 2 bits for one source register address, but that's still 6+4+6=16 bits and the opcode can only have 16 bits, of which 3 are eaten by the "size field". The basic decoding table for FC0 was:

000  scalar byte
001  scalar half-word
010  scalar word
011  scalar double word
100  SIMD byte
101  SIMD half-word
110  SIMD word
111  SIMD double word

Soon it appeared that this approach was still too limiting because you always need to handle something that won't fit these orthogonal yet arbitrary types. For example, some people will desire 128-bit integers, or 256 bits. DSP people love fixed point formats. Float formats abound and some "IA" (neural networks) algos want small numbers, while some crypto nerds wish for 512 bits. Whenever we define a size, it will be too large for some and too small for others.

Furthermore, most of the time, you will use only 2 or 3 types, which would reduce the use(fulness) of one bit. Wasting one opcode bit is baaaaaad.... One solution that was considered was to use a lookup table, which meant that you needed opcodes to read and set it, one by one... And since you can't count on people to agree, many of your function calls and returns would have to save that damned "Size LUT". What sounded simple and cheap seemed to implant its madness all over the code "just to be sure". And you would have to count on the LUT format to evolve !

Now, the "solution" I propose here might not be better, but it looks like a more serious alternative. You will still need dedicated opcodes, you will still need some hidden state that must be saved and restored upon all traps, but you will save a few opcode bits and the save-restore madness is gone ! So what is this so-called solution ? Add tags to each register to store the individual data type. This type could even be NULL and trigger a trap when this register is read as an operand.

The tag can be in machine-dependent format and is used by computational opcodes, that will obey the common-sense rules, such as: if you add two operands with a different type tag, trap, else store the result as the same type. Some opcodes could even change the type or inquire it.

The type will contain:

They are set by :

They are read by :

The good :

The bad :

The overall :

It shouldn't be such a crazy undertaking for mundane code, because in C for example, the source code explicitly declares all types in advance and it would be redundant (hence space-wasting) to repeat the size all the time in the opcodes. The compiler already tracks the register widths and handles explicit type conversions. With hardware size tags, things will be a bit more funky but this will also help save other opcodes for SIMD or vectors operations, because you don't need to create a separate opcode that adds a scalar to each element of a vector (for example).

Of course, overall, the architecture moves further away from its squarely RISC roots but FC1 is also meant to be a good general-purpose number cruncher that could handle a wide variety of workload types (that are not well handled already by the #YASEP and the #YGREC8). Keeping the type separate from the opcode leaves many doors open without the risk of saturating the opcode table. You could add LNS ("Logarithmic Number System"), complex or quaternions, matrices or whatever, and the same old ADD opcode would still work. The work would be directed to the appropriate execution unit.

Another interesting effect, by saving opcodes (both from the 3 bits saved and the deduplication for each type) is that not only it has more opcodes to enjoy but also more bits for the register addresses ! If 3 bits are saved then I can bump the register set's size to 4×32 ! This would use "only" 7+5+7=19 bits :-) And there would still be 13 bits available for 2R1W operations.





PS: This thought process was rekindled by ...