Up to this point, we have a few snippets of code to exercise our CPU ideas. But looking at a simple program to compute the Fibonacci numbers (up to a 16 bit limit) we're able to compare our various efforts against the 6502:
|OPC-1 (8 bit CPLD sized)||172||5040|
|6502 (8 bit custom)||84||1710|
|OPC-3 (16 bit OPC-1)||216||2550|
|OPC-5 (16 bit 16 register)||70||921|
We don't think that's too bad!
However, we noticed that with 16 registers we much more often operate on registers than on memory. So, we can rejig the machine to separate load and store operations and make those the only ones which operate on memory, and free up one bit for instruction encoding. (We did have another thought: if we drop to 8 registers we free up two bits...)
Here's the updated spec showing we now have 16 opcodes: we've added sub, sbc, cmp and cmpc, also not, byte swap, and access to the processor status register - which means an interrupt routine can now save and restore the machine state much more readily.
We're hoping this will improve both performance and code density. To figure that out, we've written some arithmetic routines: multiply, divide and square root.
So, we're still within 128 slices, easily, and generally a bit faster than 100MHz, which keeps us competitive with an FPGA version of 6502, although we are using a 16 bit wide memory, which would (in the day) have made for a much more expensive system. We're confident we could make a shim to connect to an 8 bit wide memory, but that would surely cost us performance.
One further improvement: that performance on the Fibonacci benchmark, of 921 cycles, we were able to improve our state machine to use fewer cycles, and get it down to 709. Over all our microbenchmarks, we got 30% performance increase. A little bit of pipelining goes a long way - as the 6502 designers also knew.
Just one more thing: we coded up a monitor program, by translating Bruce Clark's Compact Monitor, so we can more easily load and test code over a serial connection.