i'm working on reducing solve times for sqrt and div that i can not eliminate. currently the times for sqrt of all 768 cells are less than 100000 microseconds. now i'm trying to remove division and where it still is needed reduce time down drastically. for example there are 2 division operations that take most of the time and limit performance of detailed Temp measurement of 768 cells upsampled to 3072 cells at ~0.6 seconds at latest build . still not uploaded as it requires a lot of checks for modes, features and multiple processor testing. but it will be.
ok. back on point reducing sqrt root.
i have discussed this here:
and have implemented the choice to use it in the algorithms for the mlx90640, but i was also able to eliminate one of the sqrt(sqrt(x)) loops from the sensor math. but here is the sqrt function i'm using. it has an error rate average of less than 1%, but with some numbers it could go to 3%. it is more accurate than the original float i found that uses this method. there also is a fast method that corrects for the rounding error but i need to look at that in more detail. this one here is more based on my base 10 thinking.
if USE_FAST_SQUARERT_METHOD will need to be set as true.
float Q_rsqrt( float val ) //a good enough square root method.
if not enabled in Z_memManagment then it will work as regular sqrt and use math library
#if USE_FAST_SQUARERT_METHOD == true
float invertDivide=0.66666666666 ;// ~1/1.5 rounded down to float precision single float
long tmp = *(long *)&val;
val*=0.22474487139;//number that keeps precision detail by keeping remainder. (base 10 thinking. prob //better to think in base2
long tmp2 = *(long *)&val;
tmp -= 127L<<23; //* Remove IEEE bias from exponent (-2^23) */1065353216
tmp2 -= 127L<<23; //* Remove IEEE bias from exponent (-2^23) */1065353216
tmp = tmp >> 1; //* divide by 2 *
tmp2 = tmp2 >> 1; //* divide by 2 *
tmp +=1065353216; /* restore the IEEE bias from the exponent (+2^23) */
tmp2 +=1065353216; /* restore the IEEE bias from the exponent (+2^23) */
float offset=*(float *)&tmp2;
val= *(float *)&tmp;
//we do the more accurate but slower method if it is set