During the holiday break, I was able to make a significant amount of progress on the pipeline logic. At this point, I have everything working with the exception of subroutines and... exceptions. Subroutines (push old PC to memory stack, update the PC) shouldn't cause too much trouble, and while I'm not sure if there are going to be surprises in the exception handling, I'm expecting it will be similar to the existing branch code.
Before I can really implement this into the FPGA (and then test things like DOOM!), I'll also need to deal with a couple of other pesky issues. The first is to resolve the bus interactions. Right now I have a dual read memory and separate bus interfaces for instructions and data. I may adapt my memory cache interface to SDRAM to act as a L1 cache within the CPU. The other challenge is multi-cycle memory access. Right now I assume I can access memory (instruction and data) in a single cycle, and obviously that's not always going to be true when pulling memory from SDRAM. I have some ideas on how to I can address this with stall logic I've built, so we'll see.
The code is all available in GitHub as mattstock/fpgalib.