Close

getting faster math performance with inverse 1/2^x tabulation

A project log for mlx90640 sensor works w 800 bytes

Everywhere i read people are excited to get mlx90640 working. here are examples using arduino w 800bytes ram, and 1k with calibrated DEG C

jamesdanielvjamesdanielv 09/07/2019 at 02:590 Comments

there are at least ares of solving for calibration data where the result is to do this 

/2^x such as ExtractKvPixelParametersRawPerPixel. this normally isn't that big of a deal because the values were stored in ram. now, they are read for each cell. so i want to make it efficient.

the 2^x goes up to 48 in code, but i wanted some wiggle room so i did up to 64.

also the code is then normally divided so i wanted to make it a multiply operation so numbers are the result of 1/2^x. other than a lookup no math is done. this drastically speeds up calibration reads

so i made this switch(X) routine. it might even be faster as a table, but i don't know. this method does not require jumps, and can run in order execution so there is no penalty.

float SimplePowFast2sInverse(uint8_t x){//we cause  multiply instead of division will move this into PROGMEM later
//we need to do 2^48 so at least 49 values, but we can go to 64. table will be generated from javascript and in project folder  
//float value;
switch(x){ 
case 0: return 0; break;
case 1: return 0.5; break;
case 2: return 0.25; break;
case 3: return 0.125; break;
case 4: return 0.0625; break;
case 5: return 0.03125; break;
case 6: return 0.015625; break;
case 7: return 0.0078125; break;
case 8: return 0.00390625; break;
case 9: return 0.001953125; break;
case 10: return 0.0009765625; break;
case 11: return 0.00048828125; break;
case 12: return 0.000244140625; break;
case 13: return 0.0001220703125; break;
case 14: return 0.00006103515625; break;
case 15: return 0.000030517578125; break;
case 16: return 0.0000152587890625; break;
case 17: return 0.00000762939453125; break;
case 18: return 0.000003814697265625; break;
case 19: return 0.0000019073486328125; break;
case 20: return 9.5367431640625e-7; break;
case 21: return 4.76837158203125e-7; break;
case 22: return 2.384185791015625e-7; break;
case 23: return 1.1920928955078125e-7; break;
case 24: return 5.960464477539063e-8; break;
case 25: return 2.9802322387695312e-8; break;
case 26: return 1.4901161193847656e-8; break;
case 27: return 7.450580596923828e-9; break;
case 28: return 3.725290298461914e-9; break;
case 29: return 1.862645149230957e-9; break;
case 30: return 9.313225746154785e-10; break;
case 31: return 4.656612873077393e-10; break;
case 32: return 2.3283064365386963e-10; break;
case 33: return 1.1641532182693481e-10; break;
case 34: return 5.820766091346741e-11; break;
case 35: return 2.9103830456733704e-11; break;
case 36: return 1.4551915228366852e-11; break;
case 37: return 7.275957614183426e-12; break;
case 38: return 3.637978807091713e-12; break;
case 39: return 1.8189894035458565e-12; break;
case 40: return 9.094947017729282e-13; break;
case 41: return 4.547473508864641e-13; break;
case 42: return 2.2737367544323206e-13; break;
case 43: return 1.1368683772161603e-13; break;
case 44: return 5.684341886080802e-14; break;
case 45: return 2.842170943040401e-14; break;
case 46: return 1.4210854715202004e-14; break;
case 47: return 7.105427357601002e-15; break;
case 48: return 3.552713678800501e-15; break;
case 49: return 1.7763568394002505e-15; break;
case 50: return 8.881784197001252e-16; break;
case 51: return 4.440892098500626e-16; break;
case 52: return 2.220446049250313e-16; break;
case 53: return 1.1102230246251565e-16; break;
case 54: return 5.551115123125783e-17; break;
case 55: return 2.7755575615628914e-17; break;
case 56: return 1.3877787807814457e-17; break;
case 57: return 6.938893903907228e-18; break;
case 58: return 3.469446951953614e-18; break;
case 59: return 1.734723475976807e-18; break;
case 60: return 8.673617379884035e-19; break;
case 61: return 4.336808689942018e-19; break;
case 62: return 2.168404344971009e-19; break;
case 63: return 1.0842021724855044e-19; break;
};

}

this would have more of an effect but there is less waiting in a while loop for data to return to ram. 

the original code spent 98% more time in while loops. the current code spends less time in loops, so it has more math ability in general. multiply is still slower, and most of division is being removed.

i just thought to share one example of how math can be improved.

i'll update the version of the code that has this to be online within 24hrs.

Discussions