PDQ_GFX Library Development
The motivation for PDQ_GFX
A while ago, I noticed an article on Hackaday about how Paul Stoffregen (and crew) had optimized the Adafruit GFX SPI LCD driver for the Teensy 3.1 to achieve "warp speed" (see TFT LCDS HIT WARP SPEED WITH TEENSY 3.1). It was a nice demonstration of using advanced features of the 32-bit Teensy 3.1 micro-controller (along with code optimization). They used things like hardware /CS control and a hardware SPI FIFO to really speed things up from the generic Arduino API version (even when recompiled for the faster Teensy 3.1). Previously I had purchased an Adafruit SPI LCD breakout board that used this same controller and found it to be disappointingly slow (my AVR LCD gaming dreams were mostly dashed, and I didn't do much with it). At the time I just chalked it up to the fact that 8-bit AVR just wasn't up to the task of LCD graphics (especially over a slow-ish SPI bus). After seeing the impressive gain that the Teensy 3.1 was able to get, I decided it would be interesting to see if I could perhaps significantly speed up the library for Arduino AVR users without any "fancy 32-bit hardware". [Even though I have a Teensy 3.1 and they are great, I like an optimization challenge. :-) ]
In this write-up I hoped it might be interesting for me to go over some of the things I did to get about a 2.5 to 12 times times speedup (depending on primitive) using the same hardware. I did the bulk of this project many months ago, but am only now getting around to documenting it (so hopefully I am not too fuzzy on the details).
A look at the AVR SPI hardware in action
Since I had read about Paul's experience I had some ideas about what I could improve on the 8-bit AVR (however many of those optimizations were excluded as they used hardware capabilities that the AVR lacks). But the first thing to do was to take a look for myself. To start, I used my logic analyzer (Open Workbench Logic Sniffer) to see how the IL9341 driver was operating the SPI bus (a logic analyzer is a super handy and cost effective tool - I use mine to debug and explore digital hardware all the time).
In the "before" logic analyzer capture picture, you can see part of a "drawPixel" command being sent from the AVR 328P to the LCD controller over the SPI bus. In case you aren't familiar with SPI or logic analyzer captures, the important thing to notice here is "channel-3". This is the SPI clock signal (called SCK). It goes high and then low once for every bit transfered over the SPI bus. The Adafruit library normally uses the AVR hardware SPI channel, as it is in this case (it can also "bit-bang" SPI, but that is much slower). They "crank up" the SPI speed to the maximum supported by a 16MHz AVR, which is 8MHz (this means one bit can be transferred every two AVR clock cycles). So the "blue chunks" on channel-3 represent 8-bits getting sent over the SPI bus (or one byte). Now, this is the "fastest" speed that the AVR can possibly send data over SPI (I believe the LCD SPI controller can go a maximum of ~25MHz and some micro-controllers and devices support 100MHz SPI or more, for comparison). However, while the bytes are clocked out at the maximum (fixed) speed, you can see there is a lot of "dead time" between each byte that could in theory be used to speed things up. Another thing I noticed is that the Adafruit library toggles the /CS signal on channel-0 almost every single byte sent. The /CS signal is "chip-select", when it is low the LCD will listen to the SPI bus (that "/" means active-low signal), when it is high it will ignore the bus so it can be shared with another device). Since we are going to be sending a whole bunch of commands all to the LCD, it seemed to me we can just pull /CS low once, do a bunch of commands (until we are going to return to the calling sketch) and then restore /CS back to high once (in case the SPI bus will be used to talk to another device, like the SD card that is on many of these LCD modules).