So far in these projects, I've been able to build iteratively and not run into too many nasty bugs. There are many layers of abstraction though (libraries, compiler, assembler, machine, CPU), and so when a bug does crop up, it can be really challenging to find.
Most recently, I found that I had misunderstood some subtleties of transferring data between registers. The fix was simple - an opcode that zero fills the upper bits when you make a copy of an object smaller than the register size. But how this manifested itself was that sometimes printf() printed out the wrong character when printing a number. Eventually, I was able to isolate this to 33 % 10 resulting in 9 (not 3), which meant I didn't have to debug libc. After further narrowing the issue down to making a very small test case, I was able to see why the CPU was generating the incorrect value. That probably took me 4 days to debug.
As I plan on making some radical changes that could break things, I need to consider how best to avoid introducing more of these kinds of issues, and if it happens, how to quickly determine the issue.
The best idea I've got right now is to leverage the space I have within the FGPA and build more stuff. So since I plan to start trying to reduce the size of the combinatorial paths within the CPU which could effect timing, I'll create a second CPU. The new CPU will be the one I modify, and the first one will be my canary. I can feed them all the same data in parallel. The output from the canary will not be connected to the rest of the system, but will instead feed into a testing module. They module will also get taps from the second processor, and if the outputs diverge it can throw a signal that I can catch with the debug tools.
The nice thing about this is that it's fairly lightweight, and it will allow me to immediately see if the timing has changed. It doesn't rely on any other device in the system, and so I don't need to worry about special test programs or anything like that, however if I did have a program that did some additional self-tests it would be beneficial.
I'm curious if anyone has other ideas on how to build the equivalent of unit tests for systems of this complexity? I never got into the simulation aspects of Verilog - is that something that is worth the time to retrofit, or is the benefit of simulation more pre-synthesis?