Close
0%
0%

limited-code hacks/ideas

When you're running out of code-space... sometimes yah's gots to hack. Here are some ideas.

Similar projects worth following
The "1kB challenge" ( https://hackaday.io/contest/18215-the-1kb-challenge and https://hackaday.com/2016/11/21/step-up-to-the-1-kb-challenge/ ) is getting people talking about weird-ol' workarounds... Some of that discussion has inspired some "hackish" ideas of my own. Also some things I've run into in the past... Here're some ideas y'all's welcome to use.

Judges: This is more just a collection of ideas, rather than a "project" of its own, not to be considered a contender, just some info others might find useful that're a bit too long-winded for throwing in the contest's comments-section.

I'm focussing on code written in C... So... If you have a project *just exceeding* 1024 Bytes, some of these ideas may be useful for squeezing a little more outta it.

Before looking here, maybe check this document first. (Thanks, [Volt], in the comments!)

The following may just be utterly-ridiculous... I'm no expert, here... And this list is in no way sorted, nor all-inclusive, and may in fact be missing some *really* important things like using lookup tables rather than math, and using direct register-writes rather'n libraries (e.g. see @Radomir Dopieralski's logs over at #Nyan Board and #Mechatronic Ears)

Take the ideas here with a grain of salt!

I'll probably aim my efforts at AVRs, but there are *definitely* some concepts, here that apply, as well, to other architectures (and some that don't apply *at all* to AVRs) So... steal some ideas!

Oh, and, Good Lord... @Yann Guidon / YGDES pointed out something quite important...

This shizzle is in *no way* intended to be considered "good practice". Don't get into these habits! Don't use these as general-purpose guidelines of any sort! And, for goodness sake, don't use these techniques in any sort of "product" (library, operating-system, pace-maker, or anything else) unless you've *really* thought-through *all* the potential-consequences, slept on it for months, then thought through them again. But, realistically, that goes for any sort of coding, whether you use these techniques, or not.

Fergodsakes, we're talking about a friggin' contest, here. It's supposed to be *fun* and encouraging of creativity. And this "project-page" is intended for no other purpose than to allow a creative person to continue with their fun project once they've hit what might otherwise seem like a show-stopping ceiling.


Calculate your (AVR) project's program/flash-memory requirements via "avr-size":

https://hackaday.io/project/18574-limited-code-hackstips/log/49537-avr-project-doing-nada-58bytes-and-some-experimentsresults

Squeeze some bytes out of your project by (in no particular order):

Read more »

  • Tiny Circular Buffer - back to linear

    esot.eric12/30/2016 at 04:11 0 comments

    I need to add elements to the end of a buffer, and remove elements from the beginning... It can have from 0 to 4 elements loaded at a time.

    The de-facto answer might be a circular-buffer.

    But this buffer is only 4 elements long...

    It would seem that implementing this as a simple array would be more efficient, in my case. Yes, it means that when I remove an element from the beginning, I have to shift the remaining data to the beginning...

    And, the de-facto answer might be a for() loop...

    But, again, there's only four elements. So, unroll that loop, as well.

    (Note that the optimizer can look at short fixed-count for() loops, and automatically "unroll" them... I'm pretty much certain that this case will be smaller unrolled, and I'm not entirely convinced my optimizer-settings will do-so, so I'll type it manually.)

    -----------

    Interestingly, doing this as a simple array, rather than a circular-buffer, also dramatically decreased the amount of code in nearly every other function, e.g. buffer_add(), buffer_countElements(), buffer_isFull(), buffer_clear()... In fact, many of these functions are now merely comparisons/assignments to a variable such as buffer_itemCount. So, now, whereas I had multi-line functions that *could*'ve been inlined to reduce code-space in a few cases, now it's *definitely* more code-space (and execution-time!) efficient to inline these functions in every case.

  • pointer idea...

    esot.eric12/30/2016 at 02:25 0 comments

    I have to add several values from several pointers...

    e.g.

    uint16_t *pA = NULL;
    uint16_t *pB = NULL;
    uint16_t *pC = NULL;
    
    < a bunch of code that assigns pA, pB, and/or pC >
    
    uint16_t value = *pA + *pB + *pC;

    BUT any and/or all of these pointers may be unassigned... in which case, they should not be considered in the sum for value.

    Of course, using NULL pointers makes sense, to indicate whether they've been assigned. But, as I understand the C standard, you're not supposed to *access* a NULL address... You're only supposed to *test* whether a pointer is NULL.

    (e.g. address-zero may be a reset-vector, which probably contains a "jump" instruction, which most-likely is NOT zero, in value).

    So, again, if I understand correctly, the "right" way to handle these potential NULL pointers would be something like:

    uint16_t *pA = NULL;
    uint16_t *pB = NULL;
    uint16_t *pC = NULL;
    
    < a bunch of code that assigns pA, pB, and/or pC >
    
    uint16_t value = 0;
    if(pA)
     value = *pA;
    if(pB)
     value += *pB;
    if(pC)
     value += *pC;
    That's a lot of tests! Surely they'll add up in code-space...

    Instead, what about:

    uint16_t zeroWord = 0;
    uint16_t *pA = &zeroWord;
    uint16_t *pB = &zeroWord;
    uint16_t *pC = &zeroWord;
    
    < a bunch of code that assigns pA, pB, and/or pC >
    
    uint16_t value = *pA + *pB + *pC;

  • here's an idea... parsing

    esot.eric12/09/2016 at 17:34 0 comments

    This is just a random-realization while working on my project... maybe it's obvious to everyone in-the-know.

    Say you're parsing something, like commands from a terminal-window...

    Say you've got a whole bunch of commands, but they mostly all follow the same handful of formats, like Ax and Ay, Bx and By, etc.

    Lemme think of an example...

    Say motor commands (terminated with '\n'):

    SMn = stop motor number n (where n is one character, 0-9)

    AMnx = advance motor N by x millimeters (where x is any number)

    FMnx = move motorN forwards at x mm/sec

    and so-forth.

    One could, obviously, parse each character as it comes through, then do a whole bunch of nested if-then statements.

    Another idea is to combine the first and second characters into a single 16-bit variable, then use a switch() statement. Maybe not ideal for *this* example, but I've found it useful at times.

    So, that's one consideration, here's another:

    Say this motor-system also has LED-commands:

    BLnx = blink LED n x times per second

    Obviously the n and the x, here, don't apply to a motor...

    So, then, the whole nested-if statement thing makes sense, again, right?

    But we're worried about *size* here, not speed... (baud-rate's way slower than your processor, right?)

    So, then, maybe it makes sense to parse 'n' and 'x' and store them in argument-variables, and only *after that* handle the actual Command characters.

    "But wait! 'SMn' doesn't have an x!"... Right... but here's the idea:

    Say everything's stored in a string-buffer... and whatever arrived after the '\n' may very well be data from a previous command... But, your numeric-parser for x terminates as soon as anything non-numeric comes through (\n (or just get rid of the \n and it'll be terminated with \0...))... the variable for argument x will be filled with 0, but even that doesn't matter, because it's not being used, in this case...

    Then why parse it if it's not even part of the command, and isn't even there in the first place?

    So, here it is without...

    uint16_t command = string[0] | (string[1])<<8;
    
    uint8_t deviceNum = string[2] - '0';
    char *value = &(string[3]);
    
    #define commandify(a,b) \
            ((uint16_t)a | ((uint16_t)b)<<8)
    
    switch(command)
    {
        case commandify('S','M'):
            motor_stop(deviceNum);
            break;
        case commandify('A','M'):
            mm = parseNumber(value);
            motor_advance(deviceNum, mm);
            break;
        ...
        case commandify('B','L'):
            rate = parseNumber(value);
            led_blink(deviceNum, rate);
            break;
        ...
    }

    So, now, for the example described, you've either got to call 'parseNumber()' three times for the three commands that use it, or explicitly handle the 'S' case separately from the switch, (makes sense, unless there are *several* such cases, in which case your test becomes quite large, maybe even a second switch-statement).

    Or, just parse the number from the start, and don't use it if you don't need to.

    And, let's make it even more interesting, what if there's another command:

    PSs = Print string s

    Could *still* call parseNumber, *and* fill deviceNum (both with garbage) and have a really simple switch-statement (maybe even a lookup-table, at this point):

    uint16_t command = string[0] | (string[1])<<8;
    
    uint8_t deviceNum = string[2] - '0';
    
    #define commandify(a,b) \
            ((uint16_t)a | ((uint16_t)b)<<8)
    
    //### No way you're gonna get away with floats
    // in a 1K project, without an FPU ;)
    float value = parseNum(&(string[2]));
    
    switch(command)
    {
        case commandify('S','M'):
            motor_stop(deviceNum);
            break;
        case commandify('A','M'):
            motor_advance(deviceNum, value);
            break;
        ...
        case commandify('B','L'):
            led_blink(deviceNum, value);
            break;
        ...
        case commandify('P','S'):
           printf("%s", &(string[2]));
           break;
        ...
    }

  • code-size helpers (in the form of a makefile)

    esot.eric12/09/2016 at 16:13 0 comments

    Here's a minimal makefile for tracking your code-size, etc...

    (This doesn't yet create the hex-file for flashing!)

    default: build lss size
    
    #Compile, optimize for size
    build:
            avr-gcc -mmcu=atmega8515 -Os -o main.out main.c
    
    #Create an assembly-listing (with C code alongside)
    #Check out main.lss!
    lss:
            avr-objdump --disassemble-zeroes -h -S main.out > main.lss
    
    #Output the sizes of the various sections 
    # written to flash = .text + .data
    size:
            avr-size main.out
    
    clean:
            rm -f main.out main.lss
    

  • check your optimization-level!

    esot.eric11/28/2016 at 18:04 8 comments

    If working with a microcontroller, your system may already be set up for the "-Os" optimization-level, so the information here might not save you any program-memory...

    -----------------------

    AS I UNDERSTAND (I am by no means an expert on any of this!):

    -Os is "optimize for size"

    -Os basically does as much computation (of your code) as possible during the compilation-process, and tries to look for the most code-size-efficient means to compile it, rather than leaving a bunch of that code up to your processor to handle in real-time.

    Contrast that with "-O0" (no optimization), where the code will be compiled almost exactly as you wrote it.

    E.G. a really simple example:

    With -O0 (default): "a = 1 + 2;" might very well write the value 1 to the register containing the variable a, then add 2 to it. At least two instructions to be executed in realtime on your processor.

    With -Os "a = 1 + 2;" most-likely will result in one instruction, writing the value 3 to the register containing variable a.

    Other optimization-levels (-01, -02...) aren't discussed here, but check out the comments at the bottom of the page, from @Karl S, and note that they might in fact result in *larger* code than with no optimization, as it might optimize for *speed*).

    -------

    The key is, the optimization-level may have a lot to do with the size of your compiled-project... And it's not just a matter of "levels", but different types entirely

    (e.g. some optimization-"levels" may prefer execution-speed over *size*, etc. In gcc there are also "-f<options>" which allow you to fine-tune your optimizer's preferences, and there are also pragmas(?) to choose specific optimization-levels for specific parts of your code... These are a bit beyond me...)

    So you might want to do some reading-up on the matter, and/or experiment!

    -------

    Here's a [multitude of] "wow"-moments, I've experienced with the matter, but first some overview:

    I've a macro that turns a pin on a port into an output called "setoutPORT()".

    (This is for an AVR...)

    #define setoutPORT(pinNum, PORTx)   \
          setbit2(DDR_FROM_PORT(PORTx), pin)
    #define DDRPORTOFFSET   1
    #define DDR_FROM_PORT(PORTx) \
          ((_MMIO_BYTE(&(PORTx) - DDRPORTOFFSET)))
    #define setbit2(variable, bitNum) \
             (variable = ((variable) | (1 << (bitNum))))
    

    (The point is to use one definition for the port-name to use with all pin-related macros, regardless of which register they actually need to access)

    Here's a *really* simple program using it:

    #define LED_PIN  1
    #define LED_PORT PORTB
    
    int main(void)
    {
       //set PB1 as an output
       setoutPORT(LED_PIN, LED_PORT);
       while(1)
       {}
    }
    

    And here's how "main" compiles with my default optimization-level (-Os):

     int main(void)
     {
        setoutPORT(1, PORTB);
       38: b9 9a          sbi   0x17, 1  ; 23
       3a: ff cf          rjmp  .-2         ; 0x3a 
     0000003c <_exit>:
       3c: f8 94          cli
     0000003e <__stop_program>:
       3e: ff cf          rjmp  .-2         ; 0x3e <__stop_program>
    
    

    rjmp .-2 is the while(1) loop, it jumps back *to itself* (Sometimes, with optimization, the disassembly-output doesn't show all the original source-code, in this case it forgot the while(1){})

    Wow-Moment Number Zero:

    I've been using this method (setoutPORT and all its dependencies) for *years* with AVRs and have known it to (and relied on it to) compile to a single sbi instruction...

    But y'all likely haven't seen it yet, so take a moment to look at all the math involved in the setoutPORT macro... That's a *lot* of math, including pointer-arithmetic.

    I guess I was mistaken, because I always thought the Preprocessor was responsible for handling constant math, like that. Or, at least, that the compiler looked for constant-math inherently as an early-stage in the compilation-process (I guess the preprocessor wouldn't know much about pointer-arithmetic).

    I figured the -Os part of the optimizer was only handling the conversion of

    (variable = ((variable) | (1 << (bitNum))))
    

    into an sbi, which is pretty impressive in-and-of itself.

    Today's Wow-Moment:

    Here's the output with no optimization (-O0):

    int main(void)
    {
      38: cf 93          push  r28
     3a:...
    Read more »

  • Put some code-space in your "savings account"!

    esot.eric11/28/2016 at 03:10 5 comments

    This may seem a bit ridiculous, but believe me, it's turned out useful *quite-often* when expecting a project might eventually reach code-space limitations...

    Throw something "big" in your project that doesn't do anything important... At the very start of the development-process. E.G.:

    char wastedSpace[80] = { [0 ... 78] = 'a' };

    Hide it somewhere so you forget about it... Then when your project has gone from 512B to 960B, and suddenly in the next-revision it's gone from 960 to 1025... You'll go "oh sh**", then probably start looking at your code trying to figure out some ways to make it smaller... (maybe a good thing)... Then eventually you might step-back a bit frustrated and... eventually... remember that there's a sizable chunk you can take out with no consequences whatsoever, and continue your progress without having to change anything already-functional. Consider it a terrifying--and then relieving--warning.

    This example works for both program-memory as well as RAM... But there are other ways to do similar, and may even be useful in the meantime. E.G. I usually throw in a fading "heartbeat" LED... That code can be removed, entirely, from my project by merely defining HEART_REMOVED, regaining a few hundred bytes on the spot, and rendering a project which would've stalled due to a few bytes to one which can [cautiously] continue development for quite some time thereafter.

    (NOTE that SOME OPTIMIZERS might look at something like the above and recognize that it's never used, then "optimize it out". So, keep that in mind... Some other methods might be to e.g. throw something in PROGMEM. Might be a good idea to write an empty project, compile it, look at the code-size, then add your "savings account" and make sure that code-size increases as expected).

    Another thing I regularly do that "uses up space" is to throw project-info into the Flash-memory in a *human-readable* format... That way I can, years down the line, read-back the flash-memory from a chip and determine things like which project it is, which version-number, and what date it was compiled on... That info is definitely useful later down the road, but not *essential*, so potentially hundreds of bytes can be removed by removing that information. (That information is automatically-generated into "projinfo.h" by my 'makefile', and projinfo.h is then #included in main.c... so to remove it, just comment-out that #include.)

  • Multiplication/Division and Floating-point-removal

    esot.eric11/27/2016 at 14:01 0 comments

    There may be a lot of cases where floating-point "just makes sense" in the intuitive-sense...

    But Floating-Point math takes a *lot* of code-space on systems which don't have an FPU.

    I'm not going to go into too much detail, here... but consider the following

    float slope = 4.0/3.0;
    
    int pixelY = slope * pixelX + B;
    This can be *dramatically* simplified in both execution-time and code-size by using:
    int slope_numerator = 4;
    int slope_denominator = 3;
    
    int pixelY = slope_numerator * pixelX / slopeDenominator + B;
    

    Note that the order matters!

    Do the multiplication *first*, and only do the division at the *very end*.

    (Otherwise, you'll lose precision!)

    Note, also, that doing said-multiplication *first* might require you to bump up your integer-sizes...

    You may know that pixelX and pixelY will fit in uint8_t's, but pixelX*slope_numerator may not.

    So, I can never remember the integer-promotion rules, so I usually just cast as best I can recall:

    uint8_t B = 0;
    uint8_t pixelX = 191;
    uint8_t pixelY = (uint16_t)slope_numerator * (uint16_t)pixelX
                     / slopeDenominator + B;

    Don't forget all the caveats, here... You're doing 16-bit math, probably throughout the entire operation, but the result is being written into an 8-bit variable... The case above results in pixelY = 254, but what if B was 2?

    ------

    Regardless of the casting, and the additional variable, this is almost guaranteed to be *much* faster and *much* smaller than using floating-point.

    ----------

    @Radomir Dopieralski strikes again!

    I was planning on writing up about iterative-computations, next... but it's apparently already got a name and a decent write-up, so check out: https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm

    (whew, so much energy saved by linking!)

    THE JIST: The above examples (y=mx+b) should probably *NOT* be used for line-drawing!

    They're just easy/recognizable math-examples for this writeup to show where floating-point can be removed.

    On a system where you *have* to iterate through, anyhow (e.g. when drawing every pixel on a line, you have to iterate through every pixel on the line), then you can turn the complicated math (e.g. y=mx+b) containing multiplications/divisions into math which *only* contains addition/subtraction.

    (Think of it this way, how do you calculate 50/5 way back in the day...? "How many times does 5 go into 50?" One way is to subtract 5 from 50, then 5 from 45, then 5 from 40, and so-on, and count the number of times until you reach 0. Whelp, computers are great at that sorta thing, and even better at it when you have to do looping for other things as well.

  • Volatiles!

    esot.eric11/26/2016 at 11:50 0 comments

    You may be familiar with "volatile" variables...

    If not--and your project works, reliably-enough--then ignore this, because you've got 1kB to fit your code within in a short period of time, and you're not worried about your project threatening lives...

    (If those "if"s and "because"s aren't true, then be sure to check out an explanation of volatile here: https://hackaday.io/project/5624/log/49037-interrupts-volatiles-and-atomics-and-maybe-threads )

    ------

    The thing with volatile is that it's absolutely essential to understand how/when to use it, if you're doing *anything* where a person could be hurt.

    The thing with fitting your code in 1kB to blink some LEDs or load an LCD-display is that you probably don't care, as long as it works most of the time.

    I'm *in no way* suggesting you ignore this stuff habitually. You *definitely* need to be aware of it if you're ever going to do anything where others' safety is concerned, and, realistically, probably need to be aware of it even where *functionality* is concerned.

    But, that-said... It's easy to get into the "habit" of believing that "volatile" is a safe-ish way to make sure you won't run into trouble... And that's not exactly the case.

    AND, that-said... If you just use volatile, and the other techniques explained at that link, willy-nilly, then you might run into *excessive code-usage*.

    So, all's I'mma say, here, is... if you're using them willy-nilly, and if you're in a tremendous space-crunch like this contest, consider those cases carefully... You might save yourself a few (numerous/countless) bytes if you *don't* use them where you *know* you don't *need* them.

  • Multiplication / Division...

    esot.eric11/26/2016 at 10:54 19 comments

    These can be *huge* operations... Here are some thought-points on alternatives.

    Again, these techniques won't save you much space (nor maybe *any*) if you use libraries which make use of them... So, when space is a concern, you're probably best-off not using others' libraries.

    ------

    So, here's an alternative... Say you need to do a multiplication of an integer by 4...

    A *really* fast way of doing-so is (a<<2), two instructions on an AVR.

    If you need to do a division by 4? (a>>2), two instructions on an AVR.

    (Beware that signed-integer operations may be a bit more difficult).

    .....

    Another alternative is... a bit less-self-explanatory, and likely quite a bit more messy...

    In most cases, there will be *numerous* functions automatically-generated which handle multiplications/divisions between integers of different sizes. That's a lot of code generated which mightn't be necessary with some pre-planning.

    (and don't even try to use floating-point... I'm not certain, but I'm guessing a floating-point division function alone is probably close to 1kB).

    ----------

    ON THE OTHER HAND: Some architectures have inbuilt support for some of these things... E.G. whereas

    (a<<3)
    might require three instructions on any AVR,
    (uint8_t)a * (uint8_t)8
    may be only *one* instruction on a megaAVR containing a MULT instruction, but may be darn-near-countless instructions on a tinyAVR.

    Read that again... On both architectures, using <<3 may result in exactly *three* instructions, whereas in one architecture (e.g. megaAVR), *8 may result in *one* instruction, whereas in another (e.g. tinyAVR) it may result in loading two registers, jumping to a function, and a return. AND, doing-so not only requires the instructions to *call* that function, but also the function itself, which may be *numerous* instructions...

    ---------

    OTOH, again... Say you're using a TinyAVR, where a MULT instruction isn't already part of the architecture's instruction-set. If you're using other libraries which use the mult8() function, (e.g. by using a*b), mult8() *will* be included, regardless of whether you figure out other means using e.g. << throughout your own code.

    There comes a point where even using << may result in *more* instructions than a call to the mult8() function which has already been included by other libraries.

    (e.g. <<7 might be seven instructions, but if the mult8() function has already been included, then you only need to load two registers, and jump/call, which is only something like 3 instructions...)

    There are lots of caveats, here... It will definitely take *longer* to execute mult8(), but it will take *fewer* (additional) instructions, in the program-memory to call it. Again, that is, assuming mult8() is compiled into your project, via another call from elsewhere.

    -----------------------------------------------------------------------------------------------------------------------

    TODO: This needs revision. Thank you @Radomir Dopieralski, for bringing it to my attention, in the comments below! As he pointed-out, the level of "micro-optimization" explained in this document can actually bite you in the butt if you're not careful. Optimizers generally know the most-efficient way to handle these sorts of things for the specific architecture, and often find ways that are way more efficient than we might think.

    E.G. as explained earlier, (x*64) can be rewritten as (x<<6).

    If your microcontroller has a MULT instruction, (x*64) may, in fact, require the fewest number of instructions.

    If your microcontroller *doesn't* have MULT, then the optimizer (or you) might choose to replace it with (x<<6), which might result in six left-shift instructions. (or possibly a loop with one left-shift and a counter).

    But there are many other cases us mere-mortals may never think of. E.G. some microcontrollers have a "nibble-swap" instruction, where, the high nibble and low-nibble quite literally swap places. So, the optimizer *might* see (x<<6) and instead replace it with, essentially, (nibbleSwap(x & 0x0f)...

    Read more »

  • AVR project doing nada = 58Bytes, and some experiments/results.

    esot.eric11/26/2016 at 04:46 0 comments

    First, note: I'm using avr-gcc, directly, rather than going through e.g. WinAVR, or Arduino...

    And be sure to check that previous log! I am *not* using stdio, as that's *huge*, but it takes some effort to make certain it's not included.

    --------------

    Here I've created an AVR project with nothing but the following, code-wise...

    #include <avr/io.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    int main(void)
    {
       while(1)
       {}
    }

    This project compiles with the following specs, output by 'avr-size'

    _BUILD/minStartingPoint.elf  :
    section    size      addr
    .text        58         0
    .data         0   8388704
    .stab      1200         0
    .stabstr   2993         0
    .comment     17         0
    Total      4268
    
    

    As I understand the contest's requirements, this qualifies as 58 Bytes toward our 1kB limit.

    -------

    Now, what happens when we add a global-variable?

    #include <avr/io.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    uint8_t globalVar; // = 0;
    
    int main(void)
    {
       while(1)
       {}
    }

    Now we get:

    _BUILD/minStartingPoint.elf  :
    section    size      addr
    .text        74         0
    .data         0   8388704
    .bss          1   8388704
    .stab      1212         0
    .stabstr   3010         0
    .comment     17         0
    Total      4314
    
    Toward the contest-requirements, I believe this qualifies as 74 Bytes toward our 1kB limit.

    Note that I did not initialize the global variable... If I'd've initialized it to 0, we'd have *exactly* the same results. (Uninitialized global/static variables are always initialized to 0, per the C standard. THIS DIFFERS from *non-global*/*non-static* local-variables, which are *not* presumed to be 0 by default.)

    -----------

    But what happens when we initialize it to some other value?

    #include <avr/io.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    uint8_t globalVar = 0x5a;
    
    int main(void)
    {
       while(1)
       {}
    }  
    
    _BUILD/minStartingPoint.elf  :
    section    size      addr
    .text        80         0
    .data         2   8388704
    .stab      1212         0
    .stabstr   3010         0
    .comment     17         0
    Total      4321
    
    NOW, note... our ".data" section has increased from 0 to 2. (and our .bss section has dropped from 1 to 0).

    As I understand, the ".data" section counts toward both your RAM and ROM/Flash usage.

    Why both? Because the global-variable is *initialized* to the value 0x5a. The variable itself sits in RAM, but flash-memory is necessary to store the initial-value so it can be written to the RAM at boot.

    As I understand, there's a bit of code hidden from us that essentially iterates through a lookup-table writing these initial-values to sequential RAM locations, which will then become your memory-locations for your global/static variables.

    Note, again, this didn't happen when the global-variable was uninitialized (or initialized to 0) because there's no need for a lookup-table to store a bunch of "0"s, sequentially. Instead, there's a separate piece of hidden-from-us code that loads '0' to each sequential RAM location used by "uninitialized" globals/statics.

    SO...

    As I understand, per the contest-requirements, the above example counts as 80+2 = 82 Bytes toward the 1kB limit.

    -------

    I'm just guessing, here, but I imagine it went to *2* rather than *1* because they indicate the end of the initialization/"lookup-table" with a "null"-character = 0... So, most-likely, if you add a second initialized global-variable the .data section will be 3 Bytes.

    Let's Check:

    #include <avr/io.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    uint8_t globalVar = 0x5a;
    uint8_t globalVar2 = 0xa5;
    
    int main(void)
    {  
       while(1)
       {}
    }
    
    Well, color-me-stupid...
    section    size      addr
    .text        80         0
    .data         2   8388704
    .stab      1224         0
    .stabstr   3028         0
    .comment     17         0
    Total      4351
    
    .... and three?
    #include <avr/io.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    uint8_t globalVar = 0x5a;
    uint8_t globalVar2 = 0xa5;
    uint8_t globalVar3 = 0xef;
    
    int main(void)
    {
       while(1)
       {}
    }
    
    section    size      addr
    .text        80         0
    .data         4   8388704
    .stab      1236         0
    .stabstr   3046         0
    .comment     17         0
    Total      4383
    
    Uh Huh...!

    So, maybe the init-routine handles 16-bit words at a time... might make sense, since 'int' is 16-bits.

    Anyways, that's probably irrelevent.

    But, do note that the ".text" section hasn't grown at all.

    --------

    So, again, this last example would most-likely count toward 84 Bytes of the 1kB limit....

    Read more »

View all 12 project logs

Enjoy this project?

Share

Discussions

Volt wrote 11/28/2016 at 18:56 point

Here's a short but very useful guide from Atmel on optimizing your C code for size and speed:
http://www.atmel.com/Images/doc8453.pdf
Some of the tricks are very surprising!

  Are you sure? yes | no

esot.eric wrote 11/28/2016 at 19:32 point

Great resource! Thanks!

  Are you sure? yes | no

davedarko wrote 11/26/2016 at 10:41 point

Pretty cool, thank you for sharing!

  Are you sure? yes | no

esot.eric wrote 11/26/2016 at 11:35 point

Awesome, Thanks! Hope it's helpful! If my explanations seem too convoluted, please don't hesitate to say-so!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates