Since there's no floating point unit on the badge's PIC32 processor, I used 16-bit fixed-point arithmetic throughout for speed. On this 32-bit CPU, it was tempting to use 32-bit fixed point numbers, but I read in the datasheet that the MIPS 4k core could only issue a 32x32 multiply every other cycle, while it could do a 32x16 multiply every cycle. Since you have to cast up to 32-bits to multiply two 16-bit values, I decided to stick with 16 in the hopes that the compiler would be smart enough to use 32x16 multiplies. I haven't checked the assembly output, so for all I know, it could be doing the slower 32x32 multiplies anyway - maybe it can go even faster :-)

Fixed Point

I used 12 fractional bits in the 16-bit values, so you can represent values from -8 to +8, with one LSB equal to 1/4096 ( ~0.000244). A simple macro allows you to convert floating point constants:

#define S 12
#define FP(x) ((int16_t)((x) * (1<<S)))

Now, you can do stuff like this:

#define ang 10.
int16_t cs_ang = FP(0.98481); //FP(cos(ang*3.141592653/180.));
int16_t sn_ang = FP(0.17365); //FP(sin(ang*3.141592653/180.));

to set constants - here, setting the sine and cosine of the orthographic projection angle.

Multiplication

There are a few tricks to using these fixed-point values. Since each value is scaled by 4096 (shifted left 12 bits), when you multiply two fixed-point values, you end up with a result scaled twice, and have to shift back by 12 bits. For example, calculating c = a* b looks like this:

int16_t a, b, c;
a = FP(3.141596);
b = FP(0.1);
c = ((int32_t)a * b)>>S;

Like I said above, I hope the compiler would recognize that it can use a 32x16 multiply here but I don't know if it does. After the multiply, you divide by 4096 to remove the "extra" scaling factor. This could all be encapsulated in some nice C++ classes with overloaded operators and whatnot, and for all I know it has already been done somewhere (probably a million times), but it was easy enough to do it the long way. I guess even a multiply macro could help hide this.

Division

I tried to avoid division in the demo (even though there's a hardware divider) just out of habit, I suppose. It's used in one place it was really needed, but otherwise, I usually used division by power-of-2 constants which can easily (and efficiently) be done with a simple right shift.

With fixed-point math, division has the opposite problem as multiplication - after the division, you've essentially removed both scaling constants (you can think of them as cancelling), so you have to shift left to re-scale the result of the division. This has to be done in the correct order so that the wanted bits are always on the left of the LSB to avoid losing them. As an example, c = a/b looks like this:

int16_t a, b, c;
c = ((int32_t)a<<S) / b;

I didn't research the speed of division on this processor very thoroughly. Maybe I am avoiding it without good reason.

Gotchas

Having a dynamic range of (-8, +8) is pretty limiting. Obviously, the results of any calculations need to fall in this range, but you also have to make sure that any intermediate values you calculate stay within this range too. Sometimes just a simple re-ordering of operations will help - for example calculating the average of a and b could be done with:

int16_t a, b, avg;
avg = (a + b) >> 1;

but depending on the values, the sum could overflow. Instead, you could use:

int16_t a, b, avg;
avg = (a>>1) + (b>>1);

In some cases, this may be less accurate, but will avoid the overflow issue.

Cosine Table

I used a 1024-element lookup table for the cosine function. The table represents a single quadrant of the unit circle - exploiting symmetry - so there are an equivalent 4096 points in a circle . Values outside the first quadrant are folded back into the quadrant before the lookup. This is overkill for functions plotted on a 320x240 screen, but it means there's enough resolution in the table that you can just lookup values directly with no interpolation.

A quick python script generated the table with the output re-directed into a header file:

#!/usr/bin/env python3

import math

S = 12
def FP(x):
    return int(x*math.pow(2, S))

k = 10
N = int(math.pow(2, k))
print("const int16_t cos_k = {:d};".format(k))
print("int16_t cos_table[{:d}]={{".format(N))
for i in range(0, N):
    theta = (math.pi/2) * i / (N-1)
    cs = FP(math.cos(theta))
    sn = FP(math.sin(theta))
    print("  {:d},".format(cs))
print("};")

In the C-code, there's a corresponding function to lookup a cosine value from a fixed point argument. Since cosine is a periodic function, I actually use a 32-bit argument so you can get results from outside the (-8, +8) range of the 16-bit values. Having the table a power-of-2 length makes adjusting for values outside the base period a simple bitwise AND.

#include "cos_table.h"
int16_t cos_lookup(int32_t x){
  //return 4096.*cos(6.2831853*x/4096.);
  if (x < 0) {
    x = -x;
  }
  x = x & (((uint16_t)4 << cos_k) - 1);
  if (x > ((uint16_t)2 << cos_k)){
    x = ((uint16_t)4 << cos_k) - x ;
  }
  if (x > ((uint16_t)1 << cos_k)){
    x = ((uint16_t)2 << cos_k) - x;
    return -cos_table[x];
  } else {
    if (x == ((uint16_t)1 << cos_k)){
      return 0;
    }
    return cos_table[x];
  }
}

There may be a cleaner/easier/more efficient way to handle all of the edge cases on folding the quadrants here. I had an early bug that caused annoying jumpy-value errors right at one of the quadrant edges, so bullet-proofed every edge I could think of. The errors are gone, but I'm not sure this is all entirely necessary. Who cares - this was a badge "hack" after all.

I didn't use any sine functions in the demo, but they could be easily calculated by adding 1024 (one quarter of a cycle, i.e. pi/2) to the argument of the cosine. There's no need for a separate table.

Square Root

I avoided using square roots in the final demo, but I did experiment with a fast integer square root algorithm from codecodex.com. It seemed pretty quick, but I finally settled on using r^2 (=x*x + y*y) in my radial functions instead of taking the square root to get the actual radius. It's faster, and has a bonus of adding more ripples as you move away from the origin.

// from: http://www.codecodex.com/wiki/Calculate_an_integer_square_root#C
uint32_t isqrt(uint32_t x)  
{  
  uint32_t op, res, one;  
  
  op = x;  
  res = 0;  
  
  /* "one" starts at the highest power of four <= than the argument. */  
  one = 1 << 30;  /* second-to-top bit set */  
  while (one > op) one >>= 2;  
  
  while (one != 0) {  
    if (op >= res + one) {  
      op -= res + one;  
      res += one << 1;  // <-- faster than 2 * one  
    }  
    res >>= 1;  
    one >>= 2;  
  }
  return res;  
}

Fixed Point Arithmetic

Fixed Point

Multiplication

Division

Gotchas

Cosine Table

Square Root

Discussions

Fixed Point Arithmetic

Fixed Point

Multiplication

Division

Gotchas

Cosine Table

Square Root

Hidden Surface Elimination

Discussions

Become a Hackaday.io Member