One thing I've been thinking about : since the YGREC8 is a sort of subset of the YASEP ISA, wouldn't it be nice and easy to emulate the YGREC8 on the YASEP with a pipeline stage that performs binary translation of the YGREC8 instructions ?
UPDATE 20181014 :
Of course it would be more than a great feature.
In fact : it would be good to redesign the YASEP with that translation stage from the ground up.
There are two ways to do it :
- on-the-fly binary translation by hardware. This increases the latency by one cycle for predecoding but the overhead could be kept small enough to be practical. The emulation should ideally not slow the core down, SMT is possible.
- block-wide instruction translation in software. Most instructions would be translated from 16 to 32 bits wide and some hardware assistance is required to compensate this difference (and others).
In either case, some hardware is required to achieve this. Ideally, the least SW is required, the best !
The YGREC8 instructions generally map directly to "long" YASEP instructions. A "Y8" mode bit must be set to enable the translation features, but most first-order details are pretty easy to translate :
- Registers : Y8 has 8, which are a simple subset of the YASEP's 16 registers. One bit must be extended, probably by simple sign extension : The Y8 order is D1 A1 D2 A2 R1 R2 R3 PC but the YASEP has a reverse order : PC R1 R2 R3 R4 R5 D1 A1 D2 A2 D3 A3 D4 A4 D5 A5, but it's not a big deal to change this. And it's just a matter of renaming registers, the YASEP has been changed a couple of times already. I'm thinking about this, where bit 2 is copied to bit 3:
code Y8 YASEP 0000 A1 A1 0001 D1 D1 0010 A2 A2 0011 D2 D2 0100 A3 0101 D3 0110 A4 0111 D4 1000 A5 1001 D5 1010 R5 1011 R4 1100 R3 R3 1101 R2 R2 1110 R1 R1 1111 PC PC
So there is almost no hardware cost here and registers can be fetched almost immediately (Register read speed is the most critical factor for performance in this kind of core)
- Opcodes : This is getting more complex here. Some more advanced binary trickery is required... but it's possible. The YASEP has a rather flexible instruction map, with 8 well-defined groups, that can be mapped to the 4 groups of the Y8.
- Instructions : The Imm8 form maps almost directly (after some bit rerouting) to the "Long Immediate" form of the YASEP. The other forms map to the "Extended" instruction form.
- Conditions : the codes are very similar too and can be easily mapped.
There are quite some differences too :
- PC granularity : Y8 is using one address per instruction while it's not so clear with the YASEP. It would be best to use 16-bits granularity for both, to save on a shifter in the YASEP's datapath.
- IO registers : we could define a reserved range, or some other mechanism, to make the first 256 IO registers compatible with both.
- The instructions LDCL and LDCH require specific hardware and the CALL/INV/OVL system needs some adaptations...
- The ALU outputs the carry at a different place (8th bit, 16th bit or even 32nd)
Of course, the emulated Y8 can't access more than it can in a native implementation... But this emulation project is a good way to reboot the YASEP design again :-)