Since there's no floating point unit on the badge's PIC32 processor, I used 16-bit fixed-point arithmetic throughout for speed. On this 32-bit CPU, it was tempting to use 32-bit fixed point numbers, but I read in the datasheet that the MIPS 4k core could only issue a 32x32 multiply every other cycle, while it could do a 32x16 multiply every cycle. Since you have to cast up to 32-bits to multiply two 16-bit values, I decided to stick with 16 in the hopes that the compiler would be smart enough to use 32x16 multiplies. I haven't checked the assembly output, so for all I know, it could be doing the slower 32x32 multiplies anyway - maybe it can go even faster :-)

# Fixed Point

I used 12 fractional bits in the 16-bit values, so you can represent values from -8 to +8, with one LSB equal to 1/4096 ( ~0.000244). A simple macro allows you to convert floating point constants:

#define S 12
#define FP(x) ((int16_t)((x) * (1<<S)))

Now, you can do stuff like this:

#define ang 10.
int16_t cs_ang = FP(0.98481);
int16_t sn_ang = FP(0.17365);

to set constants - here, setting the sine and cosine of the orthographic projection angle.

## Multiplication

There are a few tricks to using these fixed-point values. Since each value is scaled by 4096 (shifted left 12 bits), when you multiply two fixed-point values, you end up with a result scaled twice, and have to shift back by 12 bits. For example, calculating c = a* b looks like this:

int16_t a, b, c;
a = FP(3.141596);
b = FP(0.1);
c = ((int32_t)a * b)>>S;

Like I said above, I hope the compiler would recognize that it can use a 32x16 multiply here but I don't know if it does. After the multiply, you divide by 4096 to remove the "extra" scaling factor. This could all be encapsulated in some nice C++ classes with overloaded operators and whatnot, and for all I know it has already been done somewhere (probably a million times), but it was easy enough to do it the long way. I guess even a multiply macro could help hide this.

## Division

I tried to avoid division in the demo (even though there's a hardware divider) just out of habit, I suppose. It's used in one place it was really needed, but otherwise, I usually used division by power-of-2 constants which can easily (and efficiently) be done with a simple right shift.

With fixed-point math, division has the opposite problem as multiplication - after the division, you've essentially removed both scaling constants (you can think of them as cancelling), so you have to shift left to re-scale the result of the division. This has to be done in the correct order so that the wanted bits are always on the left of the LSB to avoid losing them. As an example, c = a/b looks like this:

int16_t a, b, c;
c = ((int32_t)a<<S) / b;

I didn't research the speed of division on this processor very thoroughly. Maybe I am avoiding it without good reason.

## Gotchas

Having a dynamic range of (-8, +8) is pretty limiting. Obviously, the results of any calculations need to fall in this range, but you also have to make sure that any intermediate values you calculate stay within this range too. Sometimes just a simple re-ordering of operations will help - for example calculating the average of a and b could be done with:

int16_t a, b, avg;
avg = (a + b) >> 1;

but depending on the values, the sum could overflow. Instead, you could use:

int16_t a, b, avg;
avg = (a>>1) + (b>>1);

In some cases, this may be less accurate, but will avoid the overflow issue.

# Cosine Table

I used a 1024-element lookup table for the cosine function. The table represents a single quadrant of the unit circle - exploiting symmetry - so there are an equivalent 4096 points in a circle . Values outside the first quadrant are folded back into the quadrant...

Read more »
Really nice drawing functions dude, very impressive :-)