Close

GPU architecture Part 1

A project log for The Dingo Console

An Open Source fully hackable game console packed with features. Based on the Pentium III CPU!

floppidydingoFloppidyDingo 04/19/2016 at 00:040 Comments

I got more on my GPU architecture planned out, and I have come up with (in my opinion) a very efficient design. Instead of having all 8 cores running at the rising edge of the clock, half of them will operate at rising edge, and the other half at the falling edge. This way the GPU is always doing something. Next up, recall that each core has it's own code cache. Well that cache will operate at the opposite edge of the core's clock. So when the core sends a data request to the cache, the cache will process that request when the core settles into the next clock, so the data is ready when the next cycle begins. The texture cache will do the same thing, but I'll have to find a way to compensate for the fact that half the cores are running out of phase. Also, due to chip limitations, there will be 2MB of texture cache instead of 4.

The GPU will also have two sets of FPUs. One will be a big FPU that processes less common operations. Then each core will have their own smaller personal FPU to handle more common operations. Both FPUs will be pipelined to allow for high frequencies, and the architecture for the FPU will allow an operation to run multiple times at once. So you could spam the add instruction, and the data will come out in the order you put the arguments in (as long as you don't overflow the pipeline). The pipeline registers will also load data at the opposite edge of the clock as the core. So when the core clock goes high, the register feeds data to the next stage. When the clock goes low, the register stores data from the previous stage. I will also try to give each core branch predicting and out-of-order execution, making them superscalar.

I have also decided to not lock the GPU speed to 400 MHz, but instead experiment to see how hard I can push the Spartan 6 FPGA before I get graphical glitches. If anyone has anything other ideas to improve the architecture, then by all means let me know and I may implement it.

Discussions