limited-code hacks/ideas

I'm focussing on code written in C... So... If you have a project *just exceeding* 1024 Bytes, some of these ideas may be useful for squeezing a little more outta it.

Before looking here, maybe check this document first. (Thanks, [Volt], in the comments!)

The following may just be utterly-ridiculous... I'm no expert, here... And this list is in no way sorted, nor all-inclusive, and may in fact be missing some *really* important things like using lookup tables rather than math, and using direct register-writes rather'n libraries (e.g. see @Radomir Dopieralski's logs over at #Nyan Board and #Mechatronic Ears)

Take the ideas here with a grain of salt!

I'll probably aim my efforts at AVRs, but there are *definitely* some concepts, here that apply, as well, to other architectures (and some that don't apply *at all* to AVRs) So... steal some ideas!

Oh, and, Good Lord... @Yann Guidon / YGDES pointed out something quite important...

This shizzle is in *no way* intended to be considered "good practice". Don't get into these habits! Don't use these as general-purpose guidelines of any sort! And, for goodness sake, don't use these techniques in any sort of "product" (library, operating-system, pace-maker, or anything else) unless you've *really* thought-through *all* the potential-consequences, slept on it for months, then thought through them again. But, realistically, that goes for any sort of coding, whether you use these techniques, or not.

Fergodsakes, we're talking about a friggin' contest, here. It's supposed to be *fun* and encouraging of creativity. And this "project-page" is intended for no other purpose than to allow a creative person to continue with their fun project once they've hit what might otherwise seem like a show-stopping ceiling.

Calculate your (AVR) project's program/flash-memory requirements via "avr-size":

https://hackaday.io/project/18574-limited-code-hackstips/log/49537-avr-project-doing-nada-58bytes-and-some-experimentsresults

Squeeze some bytes out of your project by (in no particular order):

Moving to a similar, but slightly-different architecture (E.G. AVR "Tiny" -> "Mega") https://hackaday.io/project/18574-limited-code-hackstips/log/49537-avr-project-doing-nada-58bytes-and-some-experimentsresults (The same miniscule amount of code (100 Bytes) compiled for a TinyAVR requires 4 Bytes (4%!) less code-space when compiled for a MegaAVR! A larger project may scale accordingly, e.g. a 1000Byte TinyAVR project may require only 960 Bytes on a MegaAVR, plausibly even less, given other functionality such as Mult-instructions)

Considering whether you really need both initialized and "uninitialized" globals/statics -- a bit ridiculous, just an interesting discovery. https://hackaday.io/project/18574-limited-code-hackstips/log/49537-avr-project-doing-nada-58bytes-and-some-experimentsresults (Save up to 22 Bytes by *really* analyzing all your globals/statics)
Making Certain that stdio is not linked into your project -- This is probably a requirement, if you intend on fitting in 1K... Though, it most-likely is already be the case if your project is anywhere *near* fitting in 1K. https://hackaday.io/project/18574-limited-code-hackstips/log/49498-avrs-stdio-printf-etc (This could save your project nearly 1kB (1024 Bytes!), or even more!, of program-space, if you're lucky, and it's done-correctly!)
Reconsidering Multiplication/Division -- Poorly written, the basic jist is to have an idea of how these calculations work and think of ways to make them more efficient (do you really need x/65, or would x/64 work well-enough? Save a *lot* of code-space by doing-so!) https://hackaday.io/project/18574-limited-code-hackstips/log/49548-multiplication-division
Considering your usage of volatiles -- If you're *certain* you don't need them you could save a lot of code-space. https://hackaday.io/project/18574-limited-code-hackstips/log/49551-volatiles (This is NOT to be taken lightly!)
Using separate Numerator/Denominator variables, rather than floats...

Two Additions In One Operation CTD -- Two Variables one register
Eric Hertz • 04/20/2023 at 04:48 • 0 comments
A continuation of the last log:

Say you've got two 4bit variables:

One stores a state, the other stores a signed count from -7 to +7.

{Thumbtyping is hard...}

uint4_t state;
```
uint4_t state;
Int4_t count;

Switch{state}
{
   Case A:
     count++;
     Break;
   Case B:
     Count--;
...
}
```
Of course, int4_t is rare, if existent.

But merging them into one 8bit byte in register/RAM would mean a lot of boolean logic and shifts, right?

Maybe not!

in fact, on an AVR [ATmega8515=old], it may be *fewer* operations than using separate 8bit variables, at least in some cases:
```
//state in low nibble,
// SIGNED count in high nibble
uint8_t stateAndCount;

Switch{stateAndCount & 0x0f}
{
   Case A:
     stateAndCount+=0x10;
     Break;
   Case B:
     stateAndCount-=0x10;
...
}
```
My big concern was incrementing and extracting the signed count.
[LOL, I'd forgotten that, in the situation I'm considering replacing with this, I was already masking the state-only variable with 0x0f in the switch statement... this thing is just falling together!]

Incrementing is a typical add-immediate, which I think is the same number of cycles as increment, since AVR instructions are always 16bits. [16 for inc/dec, or 8 for add-immediate + 8 for the immediate value=0x10]

That leaves extracting/using the signed count
```
int32_t TotalCount += 
   ((int8_t)stateAndCount)>>4;
```
A Few concerns:

shift-right for signed integers, in C... Is it guaranteed 'arithmetic' rather than 'logical'?

[And what about negatives and rounding toward negative infinity](?]

Does the AVR have a single-instruction arithmetic-shift-right?

...

So. Thumbtyping is exhausting, I'll leave it to you to look up details, for now. But the short answer looks to be that there may even be cases where this takes *Fewer* instructions this way, than to have to load/store the two 4bit (8bit-stored) state/count variables in RAM.

...

I wonder what-all we can do that we aren't with 64 bits?!
Two Additions in one operation!
Eric Hertz • 04/12/2023 at 07:47 • 2 comments
Maybe everyone knows this already, but I just thought of it for my first time...

Say you've got a 64bit processor, but you're working with two 32bit numbers... if they're stored in a single 64bit variable, and you're sure they won't overflow, you can do two additions simultaneously! Or 4 16bit additions, or 8 8bit!

So, what could this be used for? How often would that really be useful?

I dunno... a screen is significantly smaller than 65536x65536 pixels...
....
Presently I'm using an 8bitter, I have a function that I would like to return two TRUE/FALSE values. I want to keep running sums of those two values from each time it's called.
```
//Returns:
//0x01 if button A pressed
//0x10 if button B pressed
//0x11 if both are pressed
uint8_t getButtons(void);

main()
{
   uint8_t countsCombined=0;

   for(i=0; i<15; i++)
      countsCombined+=getButtons();

   printf("A presses = %d\n"
          "B presses = %d\n",
          countsCombined&0x0f,
          countsCombined>>4);
}
```
But, holy moly, this seems a little cheesy on an 8bitter, but just think what could be done on a 16bitter, or 64bitter!

Maybe you're designing PONG on a 16bit computer in a 256x256 window, the ball moves two pixels up, one left:
```
//Upper byte is X, lower byte is Y

uint16_t ballPosition = 0x0000;

uint16_t ballStep = 0x0102;

while(wallNotHit)
{
   ballPosition += ballStep;
}
```
Tiny Circular Buffer - back to linear
Eric Hertz • 12/30/2016 at 04:11 • 0 comments

I need to add elements to the end of a buffer, and remove elements from the beginning... It can have from 0 to 4 elements loaded at a time.
The de-facto answer might be a circular-buffer.
But this buffer is only 4 elements long...
It would seem that implementing this as a simple array would be more efficient, in my case. Yes, it means that when I remove an element from the beginning, I have to shift the remaining data to the beginning...
And, the de-facto answer might be a for() loop...
But, again, there's only four elements. So, unroll that loop, as well.
(Note that the optimizer can look at short fixed-count for() loops, and automatically "unroll" them... I'm pretty much certain that this case will be smaller unrolled, and I'm not entirely convinced my optimizer-settings will do-so, so I'll type it manually.)
-----------
Interestingly, doing this as a simple array, rather than a circular-buffer, also dramatically decreased the amount of code in nearly every other function, e.g. buffer_add(), buffer_countElements(), buffer_isFull(), buffer_clear()... In fact, many of these functions are now merely comparisons/assignments to a variable such as buffer_itemCount. So, now, whereas I had multi-line functions that *could*'ve been inlined to reduce code-space in a few cases, now it's *definitely* more code-space (and execution-time!) efficient to inline these functions in every case.
pointer idea...
Eric Hertz • 12/30/2016 at 02:25 • 0 comments
I have to add several values from several pointers...
e.g.
```
uint16_t *pA = NULL;
uint16_t *pB = NULL;
uint16_t *pC = NULL;

< a bunch of code that assigns pA, pB, and/or pC >

uint16_t value = *pA + *pB + *pC;
```
BUT any and/or all of these pointers may be unassigned... in which case, they should not be considered in the sum for value.
Of course, using NULL pointers makes sense, to indicate whether they've been assigned. But, as I understand the C standard, you're not supposed to *access* a NULL address... You're only supposed to *test* whether a pointer is NULL.
(e.g. address-zero may be a reset-vector, which probably contains a "jump" instruction, which most-likely is NOT zero, in value).
So, again, if I understand correctly, the "right" way to handle these potential NULL pointers would be something like:
```
uint16_t *pA = NULL;
uint16_t *pB = NULL;
uint16_t *pC = NULL;

< a bunch of code that assigns pA, pB, and/or pC >

uint16_t value = 0;
if(pA)
 value = *pA;
if(pB)
 value += *pB;
if(pC)
 value += *pC;
```
That's a lot of tests! Surely they'll add up in code-space...
Instead, what about:
```
uint16_t zeroWord = 0;
uint16_t *pA = &zeroWord;
uint16_t *pB = &zeroWord;
uint16_t *pC = &zeroWord;

< a bunch of code that assigns pA, pB, and/or pC >

uint16_t value = *pA + *pB + *pC;
```
here's an idea... parsing
Eric Hertz • 12/09/2016 at 17:34 • 5 comments
This is just a random-realization while working on my project... maybe it's obvious to everyone in-the-know.

Say you're parsing something, like commands from a terminal-window...

Say you've got a whole bunch of commands, but they mostly all follow the same handful of formats, like Ax and Ay, Bx and By, etc.

Lemme think of an example...

Say motor commands (terminated with '\n'):

SMn = stop motor number n (where n is one character, 0-9)

AMnx = advance motor N by x millimeters (where x is any number)

FMnx = move motorN forwards at x mm/sec

and so-forth.

One could, obviously, parse each character as it comes through, then do a whole bunch of nested if-then statements.

Another idea is to combine the first and second characters into a single 16-bit variable, then use a switch() statement. Maybe not ideal for *this* example, but I've found it useful at times.

So, that's one consideration, here's another:

Say this motor-system also has LED-commands:

BLnx = blink LED n x times per second

Obviously the n and the x, here, don't apply to a motor...

So, then, the whole nested-if statement thing makes sense, again, right?

But we're worried about *size* here, not speed... (baud-rate's way slower than your processor, right?)

So, then, maybe it makes sense to parse 'n' and 'x' and store them in argument-variables, and only *after that* handle the actual Command characters.

"But wait! 'SMn' doesn't have an x!"... Right... but here's the idea:

Say everything's stored in a string-buffer... and whatever arrived after the '\n' may very well be data from a previous command... But, your numeric-parser for x terminates as soon as anything non-numeric comes through (\n (or just get rid of the \n and it'll be terminated with \0...))... the variable for argument x will be filled with 0, but even that doesn't matter, because it's not being used, in this case...

Then why parse it if it's not even part of the command, and isn't even there in the first place?

So, here it is without...
```
uint16_t command = string[0] | (string[1])<<8;

uint8_t deviceNum = string[2] - '0';
char *value = &(string[3]);

#define commandify(a,b) \
        ((uint16_t)a | ((uint16_t)b)<<8)

switch(command)
{
    case commandify('S','M'):
        motor_stop(deviceNum);
        break;
    case commandify('A','M'):
        mm = parseNumber(value);
        motor_advance(deviceNum, mm);
        break;
    ...
    case commandify('B','L'):
        rate = parseNumber(value);
        led_blink(deviceNum, rate);
        break;
    ...
}
```
So, now, for the example described, you've either got to call 'parseNumber()' three times for the three commands that use it, or explicitly handle the 'S' case separately from the switch, (makes sense, unless there are *several* such cases, in which case your test becomes quite large, maybe even a second switch-statement).

Or, just parse the number from the start, and don't use it if you don't need to.

And, let's make it even more interesting, what if there's another command:

PSs = Print string s

Could *still* call parseNumber, *and* fill deviceNum (both with garbage) and have a really simple switch-statement (maybe even a lookup-table, at this point):
[Note 2023: WHOOPS! This is glitchy!]
```
uint16_t command = string[0] | (string[1])<<8;

uint8_t deviceNum = string[2] - '0';

#define commandify(a,b) \
        ((uint16_t)a | ((uint16_t)b)<<8)

//### No way you're gonna get away with floats
// in a 1K project, without an FPU ;)
float value = parseNum(&(string[2]));

switch(command)
{
    case commandify('S','M'):
        motor_stop(deviceNum);
        break;
    case commandify('A','M'):
        motor_advance(deviceNum, value);
        break;
    ...
    case commandify('B','L'):
        led_blink(deviceNum, value);
        break;
    ...
    case commandify('P','S'):
       printf("%s", &(string[2]));
       break;
    ...
}
```

code-size helpers (in the form of a makefile)

Eric Hertz • 12/09/2016 at 16:13 • 0 comments

Here's a minimal makefile for tracking your code-size, etc...

(This doesn't yet create the hex-file for flashing!)

default: build lss size

#Compile, optimize for size
build:
        avr-gcc -mmcu=atmega8515 -Os -o main.out main.c

#Create an assembly-listing (with C code alongside)
#Check out main.lss!
lss:
        avr-objdump --disassemble-zeroes -h -S main.out > main.lss

#Output the sizes of the various sections 
# written to flash = .text + .data
size:
        avr-size main.out

clean:
        rm -f main.out main.lss

check your optimization-level!
Eric Hertz • 11/28/2016 at 18:04 • 8 comments
If working with a microcontroller, your system may already be set up for the "-Os" optimization-level, so the information here might not save you any program-memory...
-----------------------
AS I UNDERSTAND (I am by no means an expert on any of this!):

-Os is "optimize for size"
-Os basically does as much computation (of your code) as possible during the compilation-process, and tries to look for the most code-size-efficient means to compile it, rather than leaving a bunch of that code up to your processor to handle in real-time.
Contrast that with "-O0" (no optimization), where the code will be compiled almost exactly as you wrote it.
E.G. a really simple example:
With -O0 (default): "a = 1 + 2;" might very well write the value 1 to the register containing the variable a, then add 2 to it. At least two instructions to be executed in realtime on your processor.
With -Os "a = 1 + 2;" most-likely will result in one instruction, writing the value 3 to the register containing variable a.
Other optimization-levels (-01, -02...) aren't discussed here, but check out the comments at the bottom of the page, from @Karl S, and note that they might in fact result in *larger* code than with no optimization, as it might optimize for *speed*).
-------
The key is, the optimization-level may have a lot to do with the size of your compiled-project... And it's not just a matter of "levels", but different types entirely
(e.g. some optimization-"levels" may prefer execution-speed over *size*, etc. In gcc there are also "-f<options>" which allow you to fine-tune your optimizer's preferences, and there are also pragmas(?) to choose specific optimization-levels for specific parts of your code... These are a bit beyond me...)
So you might want to do some reading-up on the matter, and/or experiment!
-------
Here's a [multitude of] "wow"-moments, I've experienced with the matter, but first some overview:
I've a macro that turns a pin on a port into an output called "setoutPORT()".
(This is for an AVR...)
```
#define setoutPORT(pinNum, PORTx)   \
      setbit2(DDR_FROM_PORT(PORTx), pin)
#define DDRPORTOFFSET   1
#define DDR_FROM_PORT(PORTx) \
      ((_MMIO_BYTE(&(PORTx) - DDRPORTOFFSET)))
#define setbit2(variable, bitNum) \
         (variable = ((variable) | (1 << (bitNum))))
```
(The point is to use one definition for the port-name to use with all pin-related macros, regardless of which register they actually need to access)
Here's a *really* simple program using it:
```
#define LED_PIN  1
#define LED_PORT PORTB

int main(void)
{
   //set PB1 as an output
   setoutPORT(LED_PIN, LED_PORT);
   while(1)
   {}
}
```
And here's how "main" compiles with my default optimization-level (-Os):
```
 int main(void)
 {
    setoutPORT(1, PORTB);
   38: b9 9a          sbi   0x17, 1  ; 23
   3a: ff cf          rjmp  .-2         ; 0x3a 
 0000003c <_exit>:
   3c: f8 94          cli
 0000003e <__stop_program>:
   3e: ff cf          rjmp  .-2         ; 0x3e <__stop_program>
```
rjmp .-2 is the while(1) loop, it jumps back *to itself* (Sometimes, with optimization, the disassembly-output doesn't show all the original source-code, in this case it forgot the while(1){})
Wow-Moment Number Zero:
I've been using this method (setoutPORT and all its dependencies) for *years* with AVRs and have known it to (and relied on it to) compile to a single sbi instruction...
But y'all likely haven't seen it yet, so take a moment to look at all the math involved in the setoutPORT macro... That's a *lot* of math, including pointer-arithmetic.
I guess I was mistaken, because I always thought the Preprocessor was responsible for handling constant math, like that. Or, at least, that the compiler looked for constant-math inherently as an early-stage in the compilation-process (I guess the preprocessor wouldn't know much about pointer-arithmetic).
I figured the -Os part of the optimizer was only handling the conversion of
```
(variable = ((variable) | (1 << (bitNum))))
```
into an sbi, which is pretty impressive in-and-of itself.

Today's Wow-Moment:
Here's the output with no optimization (-O0):
```
int main(void)
{
  38: cf 93          push  r28
 3a:...
```
Read more »
Put some code-space in your "savings account"!
Eric Hertz • 11/28/2016 at 03:10 • 5 comments
This may seem a bit ridiculous, but believe me, it's turned out useful *quite-often* when expecting a project might eventually reach code-space limitations...
Throw something "big" in your project that doesn't do anything important... At the very start of the development-process. E.G.:
```
char wastedSpace[80] = { [0 ... 78] = 'a' };
```
Hide it somewhere so you forget about it... Then when your project has gone from 512B to 960B, and suddenly in the next-revision it's gone from 960 to 1025... You'll go "oh sh**", then probably start looking at your code trying to figure out some ways to make it smaller... (maybe a good thing)... Then eventually you might step-back a bit frustrated and... eventually... remember that there's a sizable chunk you can take out with no consequences whatsoever, and continue your progress without having to change anything already-functional. Consider it a terrifying--and then relieving--warning.
This example works for both program-memory as well as RAM... But there are other ways to do similar, and may even be useful in the meantime. E.G. I usually throw in a fading "heartbeat" LED... That code can be removed, entirely, from my project by merely defining HEART_REMOVED, regaining a few hundred bytes on the spot, and rendering a project which would've stalled due to a few bytes to one which can [cautiously] continue development for quite some time thereafter.
(NOTE that SOME OPTIMIZERS might look at something like the above and recognize that it's never used, then "optimize it out". So, keep that in mind... Some other methods might be to e.g. throw something in PROGMEM. Might be a good idea to write an empty project, compile it, look at the code-size, then add your "savings account" and make sure that code-size increases as expected).
Another thing I regularly do that "uses up space" is to throw project-info into the Flash-memory in a *human-readable* format... That way I can, years down the line, read-back the flash-memory from a chip and determine things like which project it is, which version-number, and what date it was compiled on... That info is definitely useful later down the road, but not *essential*, so potentially hundreds of bytes can be removed by removing that information. (That information is automatically-generated into "projinfo.h" by my 'makefile', and projinfo.h is then #included in main.c... so to remove it, just comment-out that #include.)
Multiplication/Division and Floating-point-removal
Eric Hertz • 11/27/2016 at 14:01 • 0 comments
There may be a lot of cases where floating-point "just makes sense" in the intuitive-sense...
But Floating-Point math takes a *lot* of code-space on systems which don't have an FPU.
I'm not going to go into too much detail, here... but consider the following
```
float slope = 4.0/3.0;

int pixelY = slope * pixelX + B;
```
This can be *dramatically* simplified in both execution-time and code-size by using:
```
int slope_numerator = 4;
int slope_denominator = 3;

int pixelY = slope_numerator * pixelX / slopeDenominator + B;
```
Note that the order matters!
Do the multiplication *first*, and only do the division at the *very end*.
(Otherwise, you'll lose precision!)
Note, also, that doing said-multiplication *first* might require you to bump up your integer-sizes...
You may know that pixelX and pixelY will fit in uint8_t's, but pixelX*slope_numerator may not.
So, I can never remember the integer-promotion rules, so I usually just cast as best I can recall:
```
uint8_t B = 0;
uint8_t pixelX = 191;
uint8_t pixelY = (uint16_t)slope_numerator * (uint16_t)pixelX
                 / slopeDenominator + B;
```
Don't forget all the caveats, here... You're doing 16-bit math, probably throughout the entire operation, but the result is being written into an 8-bit variable... The case above results in pixelY = 254, but what if B was 2?
------
Regardless of the casting, and the additional variable, this is almost guaranteed to be *much* faster and *much* smaller than using floating-point.
----------
@Radomir Dopieralski strikes again!
I was planning on writing up about iterative-computations, next... but it's apparently already got a name and a decent write-up, so check out: https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm
(whew, so much energy saved by linking!)
THE JIST: The above examples (y=mx+b) should probably *NOT* be used for line-drawing!
They're just easy/recognizable math-examples for this writeup to show where floating-point can be removed.

On a system where you *have* to iterate through, anyhow (e.g. when drawing every pixel on a line, you have to iterate through every pixel on the line), then you can turn the complicated math (e.g. y=mx+b) containing multiplications/divisions into math which *only* contains addition/subtraction.
(Think of it this way, how do you calculate 50/5 way back in the day...? "How many times does 5 go into 50?" One way is to subtract 5 from 50, then 5 from 45, then 5 from 40, and so-on, and count the number of times until you reach 0. Whelp, computers are great at that sorta thing, and even better at it when you have to do looping for other things as well.
Volatiles!
Eric Hertz • 11/26/2016 at 11:50 • 0 comments

You may be familiar with "volatile" variables...
If not--and your project works, reliably-enough--then ignore this, because you've got 1kB to fit your code within in a short period of time, and you're not worried about your project threatening lives...
(If those "if"s and "because"s aren't true, then be sure to check out an explanation of volatile here: https://hackaday.io/project/5624/log/49037-interrupts-volatiles-and-atomics-and-maybe-threads )
------
The thing with volatile is that it's absolutely essential to understand how/when to use it, if you're doing *anything* where a person could be hurt.
The thing with fitting your code in 1kB to blink some LEDs or load an LCD-display is that you probably don't care, as long as it works most of the time.
I'm *in no way* suggesting you ignore this stuff habitually. You *definitely* need to be aware of it if you're ever going to do anything where others' safety is concerned, and, realistically, probably need to be aware of it even where *functionality* is concerned.
But, that-said... It's easy to get into the "habit" of believing that "volatile" is a safe-ish way to make sure you won't run into trouble... And that's not exactly the case.
AND, that-said... If you just use volatile, and the other techniques explained at that link, willy-nilly, then you might run into *excessive code-usage*.
So, all's I'mma say, here, is... if you're using them willy-nilly, and if you're in a tremendous space-crunch like this contest, consider those cases carefully... You might save yourself a few (numerous/countless) bytes if you *don't* use them where you *know* you don't *need* them.

View all 14 project logs

limited-code hacks/ideas

Description

Details

Project Logs

Collapse

Two Additions In One Operation CTD -- Two Variables one register

Two Additions in one operation!

Tiny Circular Buffer - back to linear

pointer idea...

here's an idea... parsing

code-size helpers (in the form of a makefile)

check your optimization-level!

Put some code-space in your "savings account"!

Multiplication/Division and Floating-point-removal

Volatiles!

Discussions

Similar Projects

operation: Learn The MIPS (PIC32MX1xx/2xx/370)

commonCode (not exclusively for AVRs)

Smaller C

kiloboot

limited-code hacks/ideas

Become a Hackaday.io member

Just one more thing

Description

Details

Project Logs Collapse

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse