As with all good hobbies, I went on a couple of tangents recently. While I was waiting for the new daughterboard, I decided to spend some time on speed improvements.
The first step was the easiest: I just doubled the clock speed from 50MHz to 100MHz. The external memory (SDRAM and SSRAM) had plenty of headroom, and I took the opportunity to parameterize all of the modules that use their own clock (SPI, I2C, UART, etc). That helped, but not enough. The memory operations were just too inefficient.
The next step was to add simple write FIFOs for memory modules. The idea here is that writes can return immediately because the CPU isn't expecting a response. If the CPU makes a read request, that read will stall until the write FIFO is empty, ensuring that there is consistency. I put this FIFO between the CPU and the memory controllers so that it woudl do the most good. It helped, but wasn't a game changer.
As most of you probably already know, SDRAM isn't particularly efficient at single word operations. You need to "open" a row of memory, do some operations, and then "close" it again. The overhead of the open and close operations is huge when you only do a single read or write, which is what I was doing up until this point.
By adjusting the cache memory system and the SDRAM controller, I was able to to enable pipeline operations, making memory access a lot more efficient. Since I already had the cache controller handling 4 words in a cache line, it wasn't particularly hard to enable a 4 word pipeline. The nice thing is that all of this is abstracted away from the CPU - it can still use single word operations and gave some of the benefits simply because the content gets into the cache memory more quickly. This and increasing the cache size to 4k words (from 1k) made a substantial difference in performance. I'm starting to use the Doom load time as my benchmark for these things, and I've now got it down to about 1m20s from program load to menu.
There are so many additional optimizations that I can still apply:
- Block memory transfer CPU instructions
- Improving the SDRAM controller to keep rows open longer and increase the time between refresh cycles
- Improve the SSRAM controller to handle burst operations (should help with frame rate)
I may tackle some of these in the near future.