My first cut at an ISA was focused on getting the functions right, and leaving room to add more options later. Now that I've got most of the functionality I want, I can go back and look at ways to reduce the complexity, with a goal of improving performance.
I made some fairly large changes to the ISA, which I've documented on the second page of the ISA worksheet. The idea was to reduce the number of data paths in the CPU, embed some elements (e.g., the FPU operations) into the opcodes instead of the microcode, and to add additional states in the control state machine to allow as much reuse as possible.
The steps that I've taken so far seem to have worked. The FPGA compiler is indicating a new max core speed of about 75MHz, when before it was closer to 50MHz. I'm also using about 5k fewer LE's in the FPGA for the same work. Since I've added clock cycles in some of the opcode paths and I haven't actually increased the core speed, I don't know yet what the true speed improvement is, if any. But it's a lot easier to understand (even for me), and so I'll count it as a win even if it was a wash on the performance front.
Once I get the cache controller and some regression testing done, I'm going to look at pipelining and upping the clock speed, at least for the processor core.