One of the main design constraints behind the EEEmu SPI is that it has the ability to emulate some of the faster Flash SPI chipset specifications to allow for BIOS Flash ROM emulation.
Enter the SAM4S. These 120MHz ARM based devices are capable of running with various programmable delays for flexible SPI compatibility and an SPI slave frequency of 25MHz. This speed is a reasonable compromise on price and performance and covers most use cases for embedded design.
These microcontrollers are no slouch when it comes to performing operations either. However this is not the only hurdle when designing such a system. The choice to allow users to store the ROM information on an SD card also brings about significant design considerations. SD cards by nature are a sector addressable device and cannot be read byte by byte like traditional EEPROM. So how do you get low latency performance out of a setup like this?
Firstly you must consider that the speed of the card plays a significant factor. So choosing the right card is crucial to getting decent performance. For testing, a card with a read write speed of 95MB/s has been chosen. This speed of card has been selected to allow the lowest reasonable latency while operating at higher SPI bus speeds. Secondly, we don't want to tie the CPU up with performing SPI and SD transactions. The good thing is these devices have their own SPI and HSMCI peripherals which do most of the work and provide data as registers when available. Once the hardware side of things is sorted, you then move onto software.
Initially through testing a sequential SPI read transaction from a ROM file, the system would take around 320us to read the next sector once the SPI interrupt requests byte information on the sector boundary (every 512 bytes). This was by no means good enough. The EEEmu sources use the Atmel ASF libraries to operate the SD Card, SPI interface and FAT(32) filesystem.
These libraries are "basic" and designed to run the demonstration application. A few tweaks are required to get them running the way you like. The first main issue with these libraries is that they will first do a check to see if the SD card is inserted. This first check has a massive timeout delay of a second to make sure the insertion has "debounced". Subsequent checks don't have this debounce delay however. So to speed things up on startup, this delay is removed on the first check only and if the card is removed while running, the check will be performed again. It turns out they also had several nested redundant checks (4) to see if the SD card was still inserted and operational on every byte read, even if it was reading from the FAT library’s sector buffer.
Once fixed, the performance was much, much better, but still not good enough for 25MHz. To operate a byte addressable ROM format, you must tell these FAT libraries to perform an f_lseek operation to access the byte you are looking for. Unfortunately this seeking is a little slow, even if it’s from the sector buffer. So you have to put statefulness procedures in there to check if the next byte to read is the byte being requested and to ignore the use of f_lseek (note that on each byte read operation, the byte pointer moves to the next byte for the next read).
After all of this, we bring the read latency on a sector boundary down to around 10us or less (with no perceivable latency on a next byte read). While a whole lot faster than 320us, it can still be much better (for example for loading BIOS program data), considering we want operate up to an SPI data rate of 25MHz.
Enter the Buffer
One thing these embedded FAT libraries lack is decent buffering. This is due to the fact they are designed to operate within small memory restrictions and multitasking is difficult on smaller microcontrollers. Well the SAM4S has a fair bit of memory, so why not use it?
Well we don’t want to use it all because we will want a large buffer for optimal write operations and RTOS task stacks, so let’s get smart about it. The easiest way to deal with this is to pre-fetch the next sequential sector in a background operation, while also tweaking the first sector read operation to allow byte data to be accessed as soon as it’s available (there's always a small latency hit with this while the SD card processes commands). While this design still has a small delay on the first couple of bytes, once the first sector after a seek operation is read in, the rest of the sectors in a sequential/bulk read never experience any latency.
This is all keeping in mind that we want to run up to an SPI bus speed of 25MHz, so for lower speed devices, this doesn't become a problem.
There's a few more optimisations to be made to the code, so specifications will be released once finished.