Linus complains again at https://lkml.org/lkml/2025/10/1/1140
- expose your pipeline details in the ISA Delayed branch slots or explicit instruction grouping is a great way to show that you eat crayons for breakfast before you start designing your hardware platform
Delayed branches have been abandoned more than thirty years ago, even RISC-V has rejected them.
Now, I'm not certain what "explicit instruction grouping" references to. VLIW/EPIC/Itanium ? Or MMX or even the decoding front-end of the P6+ family ?
My experience with MMX and P6+ has taught me a few sour lessons and I applied them here. I think I have found a good compromise between ISA exposure, performance, evolution/compatibilitty and orthogonality.
Explicit instruction grouping is not the real problem. It is required to keep the HW lean, fast and manageable. Not every platform needs or want a huge reordering buffer that wastes energy and space.
The first key to YGREC32 and its family is symmetry: this is what makes it scalable and the same principles apply to 1, 2 or 4-way superscalar. more parallelism does not make sense since average ILP usually plateaus at 2 or 3. YGREC32 is the ILP sweet spot and YGREC64 provides more bandwidth for heavier computation loads.
The second key is that the grouping is implicit. The same program runs well without the overhead of packing/framing fields, not only because the symmetrical architecture can remap registers to different globs, but also because the instruction itself (the destination register number) directs the decoder towards the corresponding glob.
So an implicit grouping of symmetrical instructions is my answer to Linus, preserving scalability, performance and efficiency.
Yann Guidon / YGDES
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.