I have uploaded YGREC8_VHDL.20190325.tgz that should be the definitive version and definition for the ALU. I must still rewrite the ALU8 with individual tiles but all the rest is good.
I had to find a new trick or approach to implement the signed comparison (CMPS) because the version inherited from the YASEP would XOR the operands, which increases the critical datapath for the MSB and the layout/structure is broken. I saw that the only output bit that CMPS affects (compared to CMPU) is the Carry Out when the operands have opposite signs. My new solution uses the XOR of the sign bits (negated) at the output of the ROP2 section, ANDed with the opcode decoder, the result XORs the carry output bit.
This new idea is more satisfying because there are fewer gates and the carry bit has a different timing than the rest of the datapath, so it could absorb the additional XOR delay. The ROP2 part is now completely streamlined and homogeneous.