Let the hardware do what it does best.

A project log for PDP - Processor Design Principles

Distilling my experience and wisdom about the architecture, organisation and design choices of my CPUs

yann-guidon-ygdesYann Guidon / YGDES 12/20/2023 at 01:350 Comments

The title says it all but a deeper explanation will shine more light on it.

This is another point where I diverge from the Patterson & Hennessy canon, which emphasises the study of real-life code to see which opcodes are the most used. This was a major argument in the 80s when the RISC vs CISC debate raged, and we should not forget that at this time, most computers (microprocessor-based or not) were microprogrammed (let's exclude Cray's designs of course).

Reducing the instruction set complexity also reduces the weight of the microprogram, which has almost vanished by Y2K (if you forget the x86 of course). So today, when you create your own ISA, you look at the "Quantitative Approach"'s figures and should ask yourself :

P&H's RISC works (and their predecessors) have a very good point against microcode, which has been cemented during the last 40 years. But this argument has sometimes been taken to the extreme, creating weird feedback loops.

Let's take C: it was designed on a PDP-7 then PDP-11 and inherited some features of these platforms. In particular the absence of rotation operator. And there are only the AND/OR/XOR/NOT operators. So what did the RISC people do ? They found that none of their benchmarks (written in "portable" C) would include rotation or combined boolean operations. So the SPARC and MIPS didn't have one.

My point is : if you already have a barrel shifter, you can do rotation with a small additional work. The counter-argument is "it will slow down the whole pipeline for an operation that is barely used and can be emulated with 3 instructions" (if not 4). Now, when the codepath reaches the point where rotation is important, these 3 opcodes slow the tight loop down, increase the time and energy to move data around (in and out of the register set etc.) as well as the register pressure (to name a few).

OK maybe that was not the perfect example, let's look at the boolean operations: there is a whole world beyond AND, OR and XOR. Particularly if they are integrated in the ALU where the adder needs one operand to be inverted to perform SUB. This inverter (alread covered in other logs) can be reused to perform ANDN, ORN and XORN. A bit more fiddling gets you NAND and NOR, for almost no real penalty in clock frequency. And yet these are not implemented.

Intel brought ANDN with the Pentium MMX and the full ROP3 set in the latest Core generations so there is some merit to it right ?

But the common languages (and C at their roots) does not provide these operators, which must be inferred by the compiler, leading to the underuse of these opcodes.

These are only 2 examples but more exist. Thus

When an operation provides more options for marginal overhead, expose them at the opcode level. Languages and compilers may ignore them but when you need them, you'll be happy. This is why I provide ANDN with the YGREC8 and the whole ROP2 in the YASEP. The cost of these "extra features" is anecdotal and the times it will save your a$$ will be less remembered than when you miss them.

Because remember: it's a different case than the one against microcode. And don't let others tell you what you don't need.