Close

When In Doubt, Software!

A project log for muCPU: an 8-bit MCU

An 8-bit load-store CPU with 2 pipeline stages, designed in Logisim and implemented in VHDL + assembler written in Python

reed-fosterReed Foster 05/27/2016 at 03:380 Comments

With an almost complete loss of data between the SPI controller and the SSD1306, I decided that I should use just the processor core because I know that at least the processor core runs how it is supposed to (how the simulation runs) on the FPGA itself. With software SPI, I ran into a whole new set of problems. With an 8 bit immediate width, branch instructions cannot cover very much code. Furthermore, a loop that executes 256 times is not possible, so one cycle of the loop must be unfolded. By controlling the io ports with software and unfolding a loop, my code space had grown so large that the outer loop (I have 3 nested loops) was so wide I could not branch from end to end. The solution: add intermediate branches. Pseudo-code:

jmp 2; skips next two instructions when loop is executed
     ; they are only executed when branched/jumped to from
     ; elsewhere in the program
jmp outerloop_start; intermediate for backwards jump
jmp outerloop_exit; intermediate for exit branch

When I was using the vhdl SPI controller, one of the clear indicators that the SSD1306 wasn't receiving all of the data was the dimness of the screen. I did a couple tests with an arduino, and when I disabled the charge-pump configurations, the display was unlit connected to 3.3v and only very dim with 5v. I have yet to figure out why the display was able to receive some commands and not others, but I eventually will troubleshoot that with a logic analyzer or something like that.

Using this indicator, I was able to tell whether or not the software SPI worked fairly easily. Unfortunately, it took me forever to figure out what was going wrong after the initialization; there wasn't any comprehensible data that appeared on the display. I tested a pattern of bytes that I expected would fill the top row only, but found a much different result; the 16 bytes that I sent were arranged vertically.

After discovering this, I looked at the datasheet for the SSD1306 and discovered that I simply misunderstood how the device was interpreting the data I sent it. After visiting stackexchange for a couple conversion methods between hex and binary strings in python, I produced a simple program that remaps the bits in the .xbm file format (the one that I had been using to draw and save black and white images) to bits that can be sent, byte by byte, to the SSD1306 without additional processing on the FPGA. The end result:

Zoomed in (I augmented my phone camera with a small lens from an ancient video camera). Individual pixels are visible.

Assembler code:

//PortD pin assignments (nc means no connection)
//assignment     set-bitmask (or)    reset-bitmask (and)
//pin0 => mosi   00000001 | 1        11111110 | 254
//pin1 => sclk   00000010 | 2        11111101 | 253
//pin2 => dc     00000100 | 4        11111011 | 251
//pin3 => rst    00001000 | 8        11110111 | 247
//pin4 => ss     00010000 | 16       11101111 | 239
//
		//SET RST and SS (rst, ss = 1)
		li r7, 255    //set par (page address register) to io bank
		li r2, 24     //bit mask for rst (high)
		lb r1, 255(r0)//load spi reg
		or r1, r1, r2 //apply bit mask
		sb r1, 255(r0)//store spi reg
		//RESET RST (rst = 0)
		li r2, 247    //bit mask for rst (low)
		lb r1, 255(r0)//load spi reg
		and r1, r1, r2//apply bit mask
		sb r1, 255(r0)//store spi reg
		// DELAY LOOP (loops for ~4us to hold reset low for length required by ssd1306)
		// while (true) {
		//  if (i = 10) {break;}
		//  else {i++};
		// }
		li r4, 10         //r4 is stopval
		li r3, 1          //r3 is incrementval
		li r1, 0          //r1 is counting reg
delay_start:		sub r2, r4, r1    //r2 compares r4 and r1; if neg, then r1 > r4; if zero, r1 = r4
		bez r2, delay_exit//if r2==0, then exit loop
		nop               //delay slot
		bez r0, delay_start//infinite backwards loop
		add r1, r1, r3    //increment r1 by r3
		//SET RST (rst = 1)
delay_exit:		li r2, 8      //bit mask for rst (high)
		lb r1, 255(r0)//load spi reg
		or r1, r1, r2 //apply bit mask
		sb r1, 255(r0)//store spi reg
		//RESET D/C (d/c = 0; put ssd1306 into command mode, which it should already be in)
		li r2, 251    //bit mask for d/c
		lb r1, 255(r0)//load spi reg
		and r1, r1, r2//apply bit mask
		sb r1, 255(r0)//store spi reg
		//COMMAND LOOP (sends 25 init commands + 6 memory config commands; prepare ssd1306 for buffwrite)
		// while (true) {
		//   if (i = 31) {break;}
		//   else {
		//     ss = 0
		//     while (true) {
		//       if (j = 8) {break;}
		//       else {
		//         sclk = 0;
		//         mosi = byte[j];
		//         sclk = 1;
		//       }
		//     }
		//     ss = 1;
		//   }
		//loop setup
		li r1, 0      //r1 is counting reg
		li r3, 1      //r3 is incrementval
cmdloop_start:		li r4, 31     //r4 is stopval
		sub r2, r4, r1//r2 compares r4 and r1; if neg, then r1 > r4; if zero, r1 = r4
		bez r2, cmdloop_exit//if r2==0, then exit loop
		//loop contents
		li r7, 12     //set par to rom[commands]
		lb r6, 0(r1)  //r6 holds the byte to be send via spi
		li r7, 255    //set par back to io bank
		//reset ss
		li r5, 239    //bit mask to reset ss
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//send byte (loop)
		li r2, 0      //r2 is counting reg
cmdbyte_start:		li r4, 8      //r4 is stopval
		sub r5, r4, r2//r5 compares r4 and r2; if neg, then r2 > r4; if zero, r2 = r4
		bez r5, cmdbyte_exit//exit loop if r5==0
		nop           //delay slot
		//sclk low (setup mosi)
		li r5, 253    //bit mask to reset sclk
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//mosi
		//shift left unit desired bit is msb, then shift right 7; b7:b1=0, b0=desired bit
		sll r5, r6, r2//shift current bit to msb
		li r4, 7      //shamt for srl
		srl r5, r5, r4//shift msb to lsb
		lb r4, 255(r0)//load spi reg
		li r3, 254    //bitmask to reset mosi
		and r4, r4, r3//clear mosi bit of spi reg
		or r4, r5, r4 //combine spi reg with mosi_out
		sb r4, 255(r0)//store spi reg
		//sclk high (latch mosi)
		li r5, 2      //bit mask to set sclk
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		//end loop
		li r3, 1      //reset constant loop variables; r4 (stopval) reset later
		bez r0, cmdbyte_start//infinite backwards loop
		add r2, r2, r3//increment r2 by r3
		//set ss
cmdbyte_exit:		li r5, 16     //bit mask to set ss
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		//end loop
		li r3, 1      //reset constant loop variables; r4 (stopval) reset later
		bez r0, cmdloop_start//infinite backwards loop
		add r1, r1, r3//increment r1 by r3
		//SET D/C (d/c = 1; put ssd1306 into data mode)
cmdloop_exit:		li r7, 255    //set par to io bank
		li r2, 4      //bit mask for d/c
		lb r1, 255(r0)//load spi reg
		or r1, r1, r2 //apply bit mask
		sb r1, 255(r0)//store spi reg
		//DATA LOOP (sends 1024 bytes of data to gddram in ssd1306
		//
		// i = 0;
		// while (true) {
		//   r7 = i;
		//   r6 = mem(0);
		//   k = 0;
		//   cs = 0;
		//   while (true) {
		//     sclk = 0;
		//     mosi = r6(k);
		//     sclk = 1;
		//     k++;
		//     if (k == 8) {break;}
		//   }
		//   cs = 1;
		//   j = 1;
		//   while (true) {
		//     cs = 0;
		//     k = 0;
		//     while (true) {
		//       sclk = 0;
		//       mosi = r6(k);
		//       sclk = 1;
		//       k++;
		//       if (k == 8) {break;}
		//     }
		//     cs = 1;
		//     j++;
		//     if (j == 0) {break;}
		//   }
		//   i++;
		//   if (i == 12) {break;}
		// }
		li r7, 8      //set par to rom[data]
		li r1, 8      //r1 is counting reg (initialized to same value as par)
		li r4, 1      //r4 is countval
outloop_start:		li r5, 12     //r5 is stopval
		sub r6, r5, r1//r6 compares r5 and r1; if neg, then r1 > r5; if zero, r1 = r5
		bez r6, itrfwd//exit loop if stopval is reached
		nop           //delay slot
		//r6 = byte to send
		li r2, 0      //bleh
		add r7, r1, r0//set par to rom[data]
		lb r6, 0(r2)  //r6 holds the byte to be send via spi
		li r7, 255    //set par back to io bank
		//reset ss
		li r5, 239    //bit mask to reset ss
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//send byte (loop)
		li r3, 0      //r3 is counting reg
dbyte_start:		li r5, 8      //r5 is stopval
		sub r4, r5, r3//r6 compares r5 and r3; if neg, then r3 > r5; if zero, r3 = r5
		bez r4, dbyte_exit//exit loop if r5==0
		nop           //delay slot
		//sclk low (setup mosi)
		li r5, 253    //bit mask to reset sclk
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//mosi
		//shift left unit desired bit is msb, then shift right 7; b7:b1=0, b0=desired bit
		sll r5, r6, r3//shift current bit to msb
		li r4, 7      //shamt for srl
		srl r5, r5, r4//shift msb to lsb
		lb r4, 255(r0)//load spi reg
		li r7, 254    //bitmask to reset mosi
		and r4, r4, r7//clear mosi bit of spi reg
		or r4, r5, r4 //combine spi reg with mosi_out
		li r7, 255    //set par to io bank
		sb r4, 255(r0)//store spi reg
		//sclk high (latch mosi)
		li r5, 2      //bit mask to set sclk
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		//end loop
		li r4, 1      //reset constant loop variables; r5 (stopval) reset later
		bez r0, dbyte_start//infinite backwards loop
		add r3, r3, r4//increment r3 by r4
		//set ss
dbyte_exit:		li r5, 16     //bit mask to set ss
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		bez r0, 10     //skip intermediate branches, they are only executed when outer loop is repeated/exited
		nop
itrback:		bez r0, outloop_start //jump to start of outer loop
		nop
itrfwd:		bez r0, outloop_exit //jump to end of outer loop
		nop
		//inner loop
		li r2, 1      //r2 is countval
inloop_start:		bez r2, inloop_exit//exit loop if r2 overflowed
		//loop contents
		add r7, r1, r0//set par to rom[data]
		lb r6, 0(r2)  //r6 holds the byte to be send via spi
		li r7, 255    //set par back to io bank
		//reset ss
		li r5, 239    //bit mask to reset ss
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//send byte (loop)
		li r3, 0      //r3 is counting reg
lp_dbyte_start:	li r5, 8      //r5 is stopval
		sub r4, r5, r3//r6 compares r5 and r3; if neg, then r3 > r5; if zero, r3 = r5
		bez r4, lp_dbyte_exit//exit loop if r5==0
		nop           //delay slot
		//sclk low (setup mosi) #loopstart
		li r5, 253    //bit mask to reset sclk
		lb r4, 255(r0)//load spi reg
		and r4, r5, r4//apply bit mask
		sb r4, 255(r0)//store spi reg
		//mosi
		//shift left unit desired bit is msb, then shift right 7; b7:b1=0, b0=desired bit
		sll r5, r6, r3//shift current bit to msb
		li r4, 7      //shamt for srl
		srl r5, r5, r4//shift msb to lsb
		lb r4, 255(r0)//load spi reg
		li r7, 254    //bitmask to reset mosi
		and r4, r4, r7//clear mosi bit of spi reg
		or r4, r5, r4 //combine spi reg with mosi_out
		li r7, 255    //set par to io bank
		sb r4, 255(r0)//store spi reg
		//sclk high (latch mosi)
		li r5, 2      //bit mask to set sclk
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		//end loop #loopend
		li r4, 1      //reset constant loop variables; r5 (stopval) reset later
		bez r0, lp_dbyte_start//infinite backwards loop
		add r3, r3, r4//increment r3 by r4
		//set ss
lp_dbyte_exit:		li r5, 16     //bit mask to set ss
		lb r4, 255(r0)//load spi reg
		or r4, r5, r4 //appy bit mask
		sb r4, 255(r0)//store spi reg
		//end loop
		li r4, 1      //reset constant loop variables; r5 (stopval) reset later
		bez r0, inloop_start//infinite backwards loop
		add r2, r2, r4//increment r2 by r4
		//end loop
inloop_exit:		li r4, 1      //reset constant loop variables; r5 (stopval) reset later
		bez r0, itrback//infinite backwards loop
		add r1, r1, r4//increment r1 by r4
outloop_exit:		nop           //delay slot
end_program:		bez r0, end_program //halt command, processor idles in endless loop

Because of the complexity of the assembler code to display a still image, I'd probably have to add a j-type instruction or increase the instruction word size to display animations/video.

Discussions