These can be *huge* operations... Here are some thought-points on alternatives.
Again, these techniques won't save you much space (nor maybe *any*) if you use libraries which make use of them... So, when space is a concern, you're probably best-off not using others' libraries.
So, here's an alternative... Say you need to do a multiplication of an integer by 4...
A *really* fast way of doing-so is (a<<2), two instructions on an AVR.
If you need to do a division by 4? (a>>2), two instructions on an AVR.
(Beware that signed-integer operations may be a bit more difficult).
Another alternative is... a bit less-self-explanatory, and likely quite a bit more messy...
In most cases, there will be *numerous* functions automatically-generated which handle multiplications/divisions between integers of different sizes. That's a lot of code generated which mightn't be necessary with some pre-planning.
(and don't even try to use floating-point... I'm not certain, but I'm guessing a floating-point division function alone is probably close to 1kB).
ON THE OTHER HAND: Some architectures have inbuilt support for some of these things... E.G. whereas
(a<<3)might require three instructions on any AVR,
(uint8_t)a * (uint8_t)8may be only *one* instruction on a megaAVR containing a MULT instruction, but may be darn-near-countless instructions on a tinyAVR.
Read that again... On both architectures, using <<3 may result in exactly *three* instructions, whereas in one architecture (e.g. megaAVR), *8 may result in *one* instruction, whereas in another (e.g. tinyAVR) it may result in loading two registers, jumping to a function, and a return. AND, doing-so not only requires the instructions to *call* that function, but also the function itself, which may be *numerous* instructions...
OTOH, again... Say you're using a TinyAVR, where a MULT instruction isn't already part of the architecture's instruction-set. If you're using other libraries which use the mult8() function, (e.g. by using a*b), mult8() *will* be included, regardless of whether you figure out other means using e.g. << throughout your own code.
There comes a point where even using << may result in *more* instructions than a call to the mult8() function which has already been included by other libraries.
(e.g. <<7 might be seven instructions, but if the mult8() function has already been included, then you only need to load two registers, and jump/call, which is only something like 3 instructions...)
There are lots of caveats, here... It will definitely take *longer* to execute mult8(), but it will take *fewer* (additional) instructions, in the program-memory to call it. Again, that is, assuming mult8() is compiled into your project, via another call from elsewhere.
TODO: This needs revision. Thank you @Radomir Dopieralski, for bringing it to my attention, in the comments below! As he pointed-out, the level of "micro-optimization" explained in this document can actually bite you in the butt if you're not careful. Optimizers generally know the most-efficient way to handle these sorts of things for the specific architecture, and often find ways that are way more efficient than we might think.
E.G. as explained earlier, (x*64) can be rewritten as (x<<6).
If your microcontroller has a MULT instruction, (x*64) may, in fact, require the fewest number of instructions.
If your microcontroller *doesn't* have MULT, then the optimizer (or you) might choose to replace it with (x<<6), which might result in six left-shift instructions. (or possibly a loop with one left-shift and a counter).
But there are many other cases us mere-mortals may never think of. E.G. some microcontrollers have a "nibble-swap" instruction, where, the high nibble and low-nibble quite literally swap places. So, the optimizer *might* see (x<<6) and instead replace it with, essentially, (nibbleSwap(x & 0x0f) << 2). That's four instructions, rather than six.
And then, as described earlier, there's the case where _mult8() is already in your code, and the optimizer (-Os for *size* not speed) might recognize that it only takes three instructions to call _mult8().
TODO: The point, which I completely forgot in writing this late "last night", wasn't to encourage you to replace your multiplications (e.g. x*64) with shift-operations (x<<6), but to be aware that code *can* be hand-tuned/optimized, when considered *carefully* (this takes a lot of experimentation, too!) and the results may not be ideal for all platforms/architectures or even for all devices in the same architecture! And, further, doing-so *may* bite you in the butt if done from the start... (e.g. you design around *not* using _mult8(), but then later down the road realize you *have to* for something else, now your code-size increases dramatically *and* your "micro-optimizations" are slightly less efficient than merely calling _mult8())
-------E.G. consider (x*65)...
Do you *need* that much precision? If not, you might be able to get away with thinking about how your architecture will handle the operation... If your architecture has a MULT instruction, then you probably don't need to worry about it, but if it *doesn't* x*65 may very well result in *quite a few operations* that you don't need... If x*64 is close-enough, then using that *might* be *significantly* smaller in code-size and execution-time.
Note that this is a bit *in-depth* in that if somewhere else in your code (or libraries you've used) a similar operation is performed, then your compiled code will have a function like _mult8(a,b) linked-in... Calling that may only result in 3 additional instructions ( load registers a and b, call _mult8() ) whereas, again, remember that (1<<6) might result in *six* instructions. BUT: If you *know* that _mult8() is *not* used anywhere else, and you *know* that you don't absolutely need it, then you'll save *dozens* of instructions by making sure it's *never* used (and therefore not linked-in).
Think of this like the floating-point libraries... If you use floating-point, your code-size will likely grow by SEVERAL KB. If you throw usage of things like sin() or whatnot, that'll add significantly more. But if you *don't* use them, then they won't be added to your code-size. (This is similar to what happens with using global-variables which are initialized vs. those which aren't, described in a previous log). These aren't functions that *you've* written, they're essentially libraries that are automatically added whenever deemed-necessary.
Oy this is way too in-depth.
And, really, it requires quite a bit of experimentation.
TODO: A note on optimizers... -Os will most-likely consider other options such as the nibble-swap example given earlier, but some other optimization-levels will take your code word-for-word. Think you can outsmart it? :)
Realistically, these techniques may only be useful if you've got complete control over all your code, and they're *considered* along-the-way, but only implemented *at the end* to squeeze out a few extra bytes...