Close

optimizer reliance follies

A project log for operation: Learn The MIPS (PIC32MX1xx/2xx/370)

Having been exclusive to a certain uC-line for over a decade, it's time to learn something new (and port commonCode!)... Enter MIPS

eric-hertzEric Hertz 07/17/2015 at 10:595 Comments

GAH! I should just delete this whole ordeal.


MPLABX/xc32-gcc--the free version--has a few restrictions...

The highest optimization-level is 1 (min 0, max 3, or "s" for size, IIRC)...

Whelp, I'm running this 32-bit system at 50MHz, what's that... 32/8=4... 50/20=5/2... 5/2*4=10 so... TEN TIMES the processing-power of my lowly AVRs...

All it's doing is fading an LED, sorta like software-PWM... And now it's noticeably flickering...

My AVR projects usually don't do that, even when they're *heavily* bogged-down with other code...

So... we've already discovered in a previous log that the world doesn't really exist, but that's OK, we have to pretend...

So, here's where we're allegedly at:

The simple "pinOn" and related MACROs rely on a lot of math via macros... Rather'n, say, doing PORTASET = 0x01, it's doing: *(&PORTA + (&PORTASET-&PORTA))) = 0x01... That was intentional. The AVR side of things does it quite similarly, such that we can use "PORTA" instead of thinking about referring to "PINA" and "DDRA" all the time... It's SIMPLE math, really, just add a constant to an address... and it's all done in macros, so it makes it easy, and easily-readable: clrpinPORT(1, PORTA), setoutPORT(1, PORTA)...

I guess I hadn't realized how much I relied on the optimizer... avr-gcc strips that entire thing, or something similarly-ugly, down to a single instruction "setbit" at the appropriate register.

xc32-gcc (with Optimization-level 1), on the other hand... well, just look at it:

(This was originally setinPORT(), to set bit 0 as an input, but in the process of trying to figure out the slow-down, I stripped a bunch of macros, resulting in this)

//TRISx to PORTx address-offset (-0x10):
#define TPO (int)((int)(&TRISA) - (int)(&PORTA))
//xSET to x address-offset (0x08):
#define SPO (int)((int)(&TRISASET) - (int)(&TRISA))
   //setInput(bit0, PORTA)
   (*(&(PORTA) + TPO + SPO) = RPIN_TO_MASK(0));
9d0001a4:   3c02bf88    lui   v0,0xbf88   //constant
9d0001a8:   24426020    addiu v0,v0,24608 //constant
9d0001ac:   24636018    addiu v1,v1,24600 //constant
9d0001b0:   00621823    subu  v1,v1,v0    //const - const
9d0001b4:   00031880    sll   v1,v1,0x2   //(=const) - 2 (?)
9d0001b8:   00431821    addu  v1,v0,v1    //add constant
9d0001bc:   24020001    li v0,1           //load constant
9d0001c0:   ac620000    sw v0,0(v1)   //write at constant
Again, that's a simple instruction, it's basically nothing more than "TRISASET = 0x01;" and the ...SET registers are such a nice addition, it should make this thing *even faster*! I'm thinking, in this architecture, two instructions, MAX, (load an immediate value to a register, write that register's contents to the TRISASET memory-location).
All the math is done with constants, if PORTA and TRISA weren't C variables, and instead were #defines, the math coulda easily been handled by preprocessor before even getting to GCC. GCC's obviously pretty good at math (nevermind optimizing), in comparison to the preprocessor... Instead, it's leaving all these repetitive constant-calculations for run-time. WEE!

It's quite the realization about just how much the optimizer does... The fact I can easily see the flicker, combined with the fact there's *no other code running* besides the fading in-and-out of the LED, could indicate we're running the same C code easily 100-times slower on a system more than twice as fast. Shocking.

(Or, I could just be stupid, see the comments)

Discussions

Eric Hertz wrote 07/17/2015 at 19:35 point

Sheesh, this is the venture that does not end...

I was so excited that I'd finally come up with the ## method... It seemed *perfect*...

Then... Friggin' Indirection: 

#define setoutPORT(pin, port) setPORToutMasked(port, (1<<pin))

#define setPORToutMasked(port, mask) TRIS##port##CLR = mask

setPORToutMasked(A, (1<<0));

==> TRISACLR = (1<<0); Perfect.

setoutPORT(0,A);

==> "A is not defined" or something... Frankly the list of messages just made me go cross-eyed...

---------------

Went back to the calculated-method just to see what'd happen when I kept my pointer-arithmetic in order... and, sure-nough, now it's only two instructions. Frankly, the 8-instruction thing wasn't the biggest deal, but it was annoying, and, more-importantly I thought, representative of how I could expect the rest of the code to be optimizing (thus, combined-with-flickering, the misunderstanding it was running REALLY SLOW).

But, no, it turns out the reason it was flickering was due to the pointer-arithmetic being wrong... I was writing the wrong register.

So, now where we at...? Did I mention that the world doesn't really exist?

  Are you sure? yes | no

Eric Hertz wrote 07/17/2015 at 17:01 point

...AND...

It seems I'd forgotten rule number 1 of pointer-arithmetic, or something? Too tired to comprehend it.

  Are you sure? yes | no

Eric Hertz wrote 07/17/2015 at 11:09 point

...maybe I should use the preprocessor exclusively. for these cases.. e.g. using  concatenation... e.g. something like (I don't think my cpp skills are quite accurate here): 

#define setinPORT(bit, port) (TRIS ## (port) ##  SET = (1 << (bit)))

Then call it as setinPORT(1, A) which would preprocess directly to (TRISASET = 0x01)

...except, obviously, the optimization issues are affecting *all the code* which uses similar techniques reliant on optimizations...

  Are you sure? yes | no

Eric Hertz wrote 07/17/2015 at 11:13 point

oh, and... even though the preprocessor can do the (1<<bit) math, e.g. in a #if, I'm getting the impression it actually leaves the math for gcc to handle when referenced like this...(?) So, it'd actualy preprocess to TRISASET = (1<<0) which might actually, in this case, result in an assembly instruction for shift-left of zero. HAH!

  Are you sure? yes | no

Eric Hertz wrote 07/17/2015 at 15:12 point

Or, I could just be stupid... Yeah, that compiler-output was real, yeah, it was actually compiling to 8 instructions what should've been 2... but NO, it wasn't slowed-down 100+times slower than the AVR... In fact, it's running LED updates at about 400khz, which is about ten times faster than some of my most-bogged-down AVR projects... so, it seems alright... ballpark similar, anyhow.

Changing over to the ## method definitely reduced it to *two* instructions. Marked improvement, and oddly, the flickering disappeared. The odd part being: The LED-updates are running, still, at about 400khz (right, like 6 extra instructions would cause a huge difference at 400khz). So... why does the LED flicker... dramatically in one case, and not the other...?

Is this some sort of "beating" (in the physics-sense), or "aliasing" with something else running? Mind-boggled.

Optimization: well, obviously, it doesn't handle ugly cases like those shown earlier... but it did, in fact, optimize-out (1<<0) to just "1", so I give it props for that... Can expect some speed-decrease because of it, but thankfully not *nearly* as much as I thought. Guess I'll keep working with it as-is. Did discover that there *are* toolchains available, though, without the limitation...  I'll keep them in-mind.

Oh, and the ## method is actually kinda nice... e.g.

#define setPORTpu(port, bit) \

   CNPU##port##SET = (1<<bit)

Then, instead of calling setPORTpu(PORTA, 1) you can just call setPORTpu(A, 1)... less-redundant, still informative. (pu = pull-up)

Maybe I'll get around to giving my AVR-optimizer a break...

  Are you sure? yes | no