Close

Instruction grouping

A project log for YGREC32

because F-CPU, YASEP and YGREC8 are not enough and I want to get started with the FC1 but it's too ambitious yet.

yann-guidon-ygdesYann Guidon / YGDES 10/03/2025 at 18:200 Comments

Linus complains again at https://lkml.org/lkml/2025/10/1/1140

 - expose your pipeline details in the ISA

  Delayed branch slots or explicit instruction
  grouping is a great way to show that you eat
  crayons for breakfast before you start designing
  your hardware platform

Delayed branches have been abandoned more than thirty years ago, even RISC-V has rejected them.

Now, I'm not certain what "explicit instruction grouping" references to. VLIW/EPIC/Itanium ? Or MMX or even the decoding front-end of the P6+ family ?

My experience with MMX and P6+ has taught me a few sour lessons and I applied them here. I think I have found a good compromise between ISA exposure, performance, evolution/compatibilitty and orthogonality.

Explicit instruction grouping is not the real problem. It is required to keep the HW lean, fast and manageable. Not every platform needs or want a huge reordering buffer that wastes energy and space.

The first key to YGREC32 and its family is symmetry: this is what makes it scalable and the same principles apply to 1, 2 or 4-way superscalar. more parallelism does not make sense since average ILP usually plateaus at 2 or 3. YGREC32 is the ILP sweet spot and YGREC64 provides more bandwidth for heavier computation loads.

The second key is that the grouping is implicit. The same program runs well without the overhead of packing/framing fields, not only because the symmetrical architecture can remap registers to different globs, but also because the instruction itself (the destination register number) directs the decoder towards the corresponding glob.

So an implicit grouping of symmetrical instructions is my answer to Linus, preserving scalability, performance and efficiency.

Discussions