this is for ili9341 display. it has 4x as many pixels to write. if we write every other pixel line we get similar visual with 2x performance. since we can only go up to 128x128 (this is scaled to 2x2 pixels per sample on this display) sub sampled (only 64x64 with good results so far) this seems acceptable. 128x128 is being worked on and will be updated within 2 weeks. i've got an other article blog coming out on how to do super sub sampling without floats.
more pixel writes in a row make faster pixel writes because every time you change x or y position requires command overhead. just by switching direction that lines are made for every other line write it speeds up 8x8 block size (30x32 pixels) to 1900us, from 2350us. instead of writing 30 pixels and having 16 write new position commands overhead, we have write 32 pixels with 15 new positions command overhead. only difference is we write top to bottom, rather than left to right. we write 64 of these blocks
this equals 8.22 fps *2 (keep in mind buffering blocks reduces writes 2x -5x!, so average frame rate is 16.44fps)