Here I want to document some of assembly language tricks I used in this project. People doing it every day for years will scoff, but the perhaps somebody may be interested. I'm going to target it for PIC18, but it should be applicable for other MCU platforms too.
Just quick vocabulary:
; denotes comment, ignored line
call is jump to subroutine with program counter (PC) saved on stack
jump is jump to subroutine with no PC saving
return is restitution of PC from stack
Prepare your variables
Sometimes you can gain some bytes of code memory if you properly arrange your variables. Some MCU platforms do have faster access for something like first 256B of RAM, or some access bank
Exploit reset values
Sometimes you don't need to explicitly define the register value, if it is switched into defined state by hardware after reset and you know it applies to your conditions.
My code needs to wipe out random values from buffers from RAM and enter them into defined state (0x00). Because my working variables and display/input buffers are located at the beginning of RAM, I can wipe it out by setting data pointer at zero and write zero/increment pointer 256 times in single loop.
Setting the data pointer takes two instruction words when using instruction LFSR, which properly loads all 12 bits of data pointer. But datasheet states the upper nibble of the data pointer is zeroed after reset
Unfortunately, reset value of lower byte is undefined, so I have to force it zero. Still, CLRF instruction (clear single register) takes one byte versus two bytes of LFSR. In subsequent code I employ LFSR to properly initialize data pointers, though.
Let's have subroutine that prints ASCII character on your display
print_ascii: ;assume it's called with value in W register ;do this and that return
Now you may want to have subroutine that prints unpacked BCD (binary 0..9) on the display. You can do it like this
print_binary: addlw '0' ;adjust the W value into ASCII 0..9 call print_ascii return print_ascii: ;assume it's called with value in W register ;do this and that return
Which would definitely work, but you can omit all the cal/return thing by having subroutine with two entry points, or having fall-through to function.
print_binary: addlw '0' ;adjust the W value into ASCII 0..9 ;and now fall through the ASCII print print_ascii: ;assume it's called with value in W register ;do this and that return
You call the print_binary subroutine, it does its job and returns via return instruction, which "belongs" to print_ascii subroutine.
Let's have subroutine which prints 3 consecutive characters from buffer, utilizing print_ascii subroutine as before. The simplest approach goes like this
print_three: ;prepare number 1 call print_ascii ;prepare number 2 call print_ascii ;prepare number 3 call print_ascii return
This would certainly work, but the last subroutine call is somehow pointless. First you call subroutine print_three (consumed one stack level), then you do this and that up to last print_ascii call, where you call it (consume one stack level), the function does what is expected, then it returns (decrease stack usage) just to return again (return stack to previous value). We can save one return by doing this:
print_three: ;prepare number 1 call print_ascii ;prepare number 2 call print_ascii ;prepare number 3 jump print_ascii
It works as before, but the last call (with stack increase) is replaced by plain jump. Now the function print_three will not return via "its own" return, but via return "belonging to" function print_ascii. One return spared. Also stack level spared (only at the single subroutine "call", not in the previous calls), what could be sometimes beneficial.
I used this one for hexadecimal number printing - in fact it is a bit convoluted use of fall-through. I could do my print_hex function like this
print_hex: ;extract low nibble of number call print_ascii ;extract high nibble of number call print_ascii return print_ascii: ;assume it's called with value in W register ;do this and that return
Instead, I did it like this
print_hex: ;extract low nibble of number call print_ascii ;extract high nibble of number ;fall-through print_ascii: ;assume it's called with value in W register ;do this and that return
It takes low nibble, calls subroutine print_ascii and returns back to extract high nibble of the number, with the number in working register it falls-through to print_ascii again and now its return instruction will set program counter back from where print_hex was called. One call and return saved.
Use relative branching if possible
Many MCU platforms do have option of relative branching, as opposed to absolute jumps. Relative branches usually take less program space, but have limited range. One can take advantage of this for local, related functions, like the display functions I mentioned before. For example, PIC18 has relative branch range +-1023, which is perfect fit for 1kB contest.