-
Program Position Independence
02/12/2018 at 21:46 • 1 commentOne of the design constraints of the MC6809.
That quality of a program to execute properly when placed anywhere within the memory address map. Thus the program is independent of its position within the memory map.
(which means : void kludges !)
It's not enough to have a PC-relative addressing mode (particularly for a Harvard memory layout).
- to be discussed...
-
Visible states and atomicity
02/10/2018 at 17:22 • 0 commentsDesigning an Instruction Set Architecture is fine. Making one that won't shoot you in the feet later is something else !
A nice and efficient architecture must not only run fast and be easy to program but it must also be easy to fully save and restore, for example for debug purposes, interrupt handling, virtualisation and context switching (with a preemptive kernel).
If you want to see an example of what NOT to do, look at the PIC16F architecture. Interrupt Service Routines are tricky and most importantly, the hardware stack is a P.I.T.A. (I managed to make #PIC16F/OS but it is limited to a fully cooperative system)
Some architectures provide two or more banks of registers to reduce latency, though often it just postpones the backup process. F-CPU implemented the SRB (Smooth Register Backup) to address this issue.
But what really kills is state you can't save and restore. Often, status registers are tricky, because of all the side effects. Don't forget the coprocessors either. But some "amart ideas" and "tricks" end up being kludges, such as the MSP430's Hardware multiplier, which is a peripheral where not all registers can be written (so you can't easily save and restore the state).
This is a strong argument FOR a unified register set and orthogonal ISA, as well as minimising the number of status flags and registers.
-
Register #0 = 0
02/10/2018 at 01:48 • 0 commentsThere is an old tradition with the RISC canon : the very first register reads as zero.
Either it's a convention and it's set by software, or the register set is hardwired to 0. In both cases the register is meant to simplify some computations and save some opcodes.
This was known to the CDC6600 developers back in the 60s/70s, you would start a program by setting B0 to 0. See Wikipedia. I also had private conversations a log time ago with a CDC user about this... I'm not sure about the POWER architecture but this convention persisted in the MIPS architecture, got adopted by the Alpha AXP architecture (though it's R31) and is still vigorous in RISC-V.
Famously, many other architectures didn't follow this convention and this is typical when your register set is small (16 or fewer registers) like ARM.
My opinion about this is : I used to follow it (in F-CPU FC0 in y2k) but now I don't.
- One reason is that the registers are often a scarce resource. 1/32 of your hardware is very little until you need it.
- Another reason is that SRAM blocks in FPGA don't play well with special cases : they are uniform so adding a "Read As Zero" entry adds gates and increases the latency (even so slightly but still).
- Yet another reason is that this was motivated by a reduced opcode space (one of the aspects of the RISC canon). But don't mistake opcode space size, and opcode complexity : with highly orthogonal instructions, size is less of a problem. It can even save a bit of decoding logic.
- I have others that I might mention later.
My approach today is
- Don't hardwire a register to 0. If you need 0, you'll set a register to this value. I estimate it's "pretty infrequent".
- There are already many other "special purpose registers" in my designs : the A and D registers, PC... and they use quite a lot of registers already...
- Design instructions that explicitly perform their intended function. For example : NOP might be an assembly macro that encodes "OR R0 R0 R0" but in more advanced architectures, this would waste two register set reads, one boolean operation and a register set write... Have a true NOP opcode that explicitly conveys the intent.
- If you want to have a "write ignore" register, just pick one where it's available. Even an address register might work (as long as you don't read the corresponding D register after that).
.
TODO : list all the use cases of the RAZ/WI (Read As Zero/Write Ignored) gimmick.
-
Reserved opcode values
02/07/2018 at 17:10 • 6 commentsInstructions are often "sprinkled" in a seemingly random fashion.
I have found a convenient convention/guidance :
- NOP is encoded with all-0s => easier to spot in a debug hex dump.
- INV (invalid) is with all-1s => this traps the core if it executes unprogrammed (E)EPROM.
It's totally arbitrary and INV is also found as all-0s to catch executing uninitialised RAM.
NOP is also often a special case for some instruction combination. But it's good to give the opcodes special values to ease decoding and extensions. NOP and INV can be pools of unused opcodes for later revisions.
-
Instruction width
02/04/2018 at 01:22 • 3 commentsWhat should be the size of an instruction ?
Well, it depends... of course !
Usually, this is a constraint from the very beginning. But the beginning is the moment when nothing is quite sure so the chances are high to get it wrong. However it's one of the most determining features of a core.In my experiments, I have found that the average ideal instruction width is around 24 bits. Which is not handy at all. This is what I chose for the #YGREC-РЭС15-bis because Harvard makes it still convenient (the ADSP21xx line uses 24 bits too for a 16-bits datapath). But this is an exception...
#YGREC8 has 16 bits per instruction. It's very tight (there can be only 2 register addresses) but still works for a 8 bits datapath.
"compact" instructions sets (like ARM Thumb) also favor 16 bits but this also increases the instruction count so the code size does not decrease by a factor of 2 compared to a plain 32 bits processor.
#YASEP Yet Another Small Embedded Processor uses a variable size instruction, either 16 or 32 bits (with one prefix bit). It's a good compromise but might be tricky if the architecture is extended beyond its intended target.
RISC-V manages to use 30 out of 32 bits. It's nice but the proposed extensions are crazy weird...
Most RISC processors use 32 bits.
Anyway it's pretty clear that variable-sized instruction sets must be avoided (the case of the YASEP is a bit special). -
Harvard vs Von Neuman
02/04/2018 at 01:04 • 1 commentThat one is going to be one of the most controversial because it goes against most of the established practices.
Recently I have decided to side with Harvard. There are two reasons : speed and safety.
As a side-benefit, it allows the designer to use weird instruction widths.
(to be continued, the subject is quite hot)
-
Status registers...
02/04/2018 at 00:58 • 0 commentsStatus registers...
Just like the "GOTO considered harmful", status registers have been considered "impure" by the RISC church.
I would say : do what you can to avoid them. They can create a lot of problems. But there are times when you can't avoid them and trying to be too smart will backfire.
To be listed :
- the countless ways status bits (like zero flag, carry flag etc.) will harm your architecture (the RISC credo)
- how RISC manages to avoid them
- how status bits are still present in many RISC architecture... under another name...
- why they can't be avoided in microcontrollers
- how to not implement them
- what to not put in a status register
- ...
As a rule of thumb : status flags start to hurt when the core gets pipelined or superscalar. They can keep performance from increasing and force enhanced cores to jump through OoO hoops. So if your core isn't meant to be as powerful as an application processor, yeah, go for it, but be very careful.
POWER managed to solve this but I'm not fluent with this architecture.
-
CALL is just a MOV with swap
02/02/2018 at 15:54 • 2 commentsSpeaking about the PC register, there is one typical feature found in the YASEP and the YGREC. This can't be directly applied to other more sophisticated architectures but it's very handy for small ones.
The ALU result bus is swapped with the NPC when executing a CALL instruction. This effectively saves the NPC to the register set and the result goes to PC.
This is a nicely RISCy method because the return address is saved to the register set, and not a "link register" or a stack. This is very flexible so you can have coroutines and all kinds of neat features at almost no price, as long as you manage your stacks manually and correctly.
On the YASEP and YGREC, the return address (NPC) is often saved to a register that maps to memory. All you have to do then is increment/decrement the pointer to create an effective stack.
As a result, the CALL opcode is "just another version of MOV with the swap bit set".
Now, what happens with a CALL to PC ? The NPC is lost because PC is overwritten with the result bus. This is easily detected and can serve as a special type of JUMP (in YGREC, it's the "overlay" instruction).
Among the other notable strategies for dealing with CALL, let's mention the system implemented by TREX, inspired by some FORTH processors. This is pretty radical : every write to PC saves the last PC to the dedicated/implicit LINK register. So there is no distinction between CALL and JUMP for example. But this wastes one register for a fixed function, which might not be suitable to architectures with a small register set... -
PC should belong to the register set
02/02/2018 at 05:42 • 1 commentThe registers are a precious resource. Using the coding space for something else reduces the orthogonality and flexibility of the whole architecture.
However, including the PC as a read/write register has many benefits. AMD introduced the RIP-relative addressing mode with their AMD64, though it's not directly visible. You might still be able to do a LEA [0] to get RIP in a register.
Classic operations with the PC can emulate many special opcodes and free some space in the opcode map. Make sure though that you correctly handle the value : is PC pre- or post-incremented ?
In my design, there is often a cohabitation of PC and NPC :
- PC is the address of the current instruction. It's good to have it for when you have a trap.
- NPC is "Next PC" or "New PC", pointing to the next instruction (not the destination of a jump, yet, it's usually the output of an incrementer). It's important to have it to fetch the next instruction, sure, but also to save it during function calls or interrupts.
Now, if the PC is a user-visible and writable register, you can have some fun :
- any writes to PC will jump. You can perform conditional jumps if the opcode is predicated. For free.
- you can save PC easily along with all the register set, in a clean bundle.
- you can directly implement indirect jumps (or calls) by loading memory into PC. No funky addressing modes to take care of.
This approach uses one register in an already tight register set but the gain in opcode space and flexibility are worth it, IMHO.
-
Partial register writes are tricky.
02/02/2018 at 05:22 • 2 commentsAs noted in the comments of Use registers. by @alan_r_cam, partial registers are a seducing feature. It was born in the 70s when microprocessors were 8-bits wide internally and made 16-bits registers out of pairs of physical byte-wide storage. 16-bits operations would take two cycles instead of one.
The MC68000 was 16-bits wide internally and emulated a 32-bits architecture. So handling 16 bits quantities was not a problem.
But this practice has vanished in the 80s. Speed is now really critical and whole registers are stored and handled in a single cycle. The last processors that still implement this feature are the x86 line, using "µops" to decompose the partial accesses into discrete shifts and masks. This causes all sorts complicated issues, for example the detection of pipeline hazards and all kinds of dependencies.
Avoid partial register writes. They complicate the pipeline, add latency because you insert a shifter in the critical datapath, which slows the whole CPU down. Use explicit shifts instead. The YASEP has a dedicated unit for alignment and sign extension, the IE unit (for Insert/Extract).