I've been away from this project for a few months (OK, four months) building things like a new tool for designing electronics. One of the things I haven't discussed here is the time it takes to download a bitstream to the FPGA on the CAT Board.
As shown in previous logs, the FPGA is configured through one of the hardware SPI ports of the RPi. I've never considered SPI a very fast way of transferring data, so I initially set the port bit rate at 1 Mbps. That was good enough to get the FPGA going within a couple of seconds and there was no reason to push it and possibly cause errors while I debugged the board.
But once the board was working reliably, I revisited the SPI bit-rate setting. I figured there was no harm in upping it to 5 Mbps just to see what happens. I went into the litterbox.py script and changed it to:
self.spi.speed = 5000000Then I ran the command to load the FPGA with the bitstream for the LED blinker:
sudo litterbox -c blinky.bin
The download to the FPGA completed more quickly than before and the LED started blinking. Success!
Then I started pushing for more: 10 Mbps, 20 Mbps, 50 Mbps, no problem; 100 Mbps, 150 Mbps, still five-by-five; 200 Mbps, complete and utter failure.
OK, I hadn't expected to get even close to 200 Mbps. With a little trial and error, I finally found the maximum speed I could use was 199,999,999 bps. The reason for that becomes clear later.
Now, was I actually transferring bits at 200 Mbps, or was the software making a promise that the hardware couldn't keep? To test that, I wrote some code to time the transmission of a 10 MByte payload and compute the effective bit-rate while I also observed the maximum SPI clock frequency and duty cycle with an oscilloscope:
|spi.speed (Mbps)||Actual Speed (Mbps)||Fmax (MHz)||Duty Cycle (%)|
As can be seen, the actual transmission speeds are quite a bit lower than the speed setting. The reason for that is the overhead in the python-spi module that copies and converts the individual 4096-byte packets of the payload before sending them to the SPI driver. Even though each packet gets transmitted at a high clock speed, there's a significant "dead time" (2.3 ms) while the software readies the next packet. As the raw speed increases, the packet transmission time decreases and the dead time (which stays constant) consumes a larger percentage of the time to send the full payload. That's why the duty cycle decreases as the speed setting increases.
To decrease the overhead, I modified the python-spi code as follows:
- The data payload is checked and no conversion or copying is done if it is already in the form of a string of bytes.
- The address of the current packet within the payload is sent to the SPI device driver rather than making a copy of the packet.
After these two changes, setting spi.speed to 100 Mbps resulted in an actual transmission speed of 65 Mbps (an increase of 540%).
There's no reason to set the spi.speed to a value greater than 100 Mbps. The table indicates the RPi is generating the SPI clock by dividing a master 200 MHz clock by an integer. Any setting between 100 and 199 Mbps will result in an SPI clock of 100 MHz, and going to 200 Mbps has already proven too fast for sending an FPGA configuration bitstream. (The iCE40HX datasheet also shows the SPI clock in slave mode should not exceed 25 MHz, so getting to 100 MHz is really pushing it already.)
A transfer rate of 65 Mbps opens up some interesting possibilities. That means there is an 8 MByte/second channel between the CAT Board FPGA and the RPi that uses only a few pins of the GPIO connector. I have some Xilinx-centric VHDL modules and a Python library that provide a printf-like debug interface for FPGA designs through the JTAG port. I can modify these to use the SPI port so the CAT Board + RPi will have the same capabilities. I'll be working on that next. I think. Maybe.