I did an initial analysis of memory usage in libVorbis, and it isn't pretty: upwards of half-a-megabyte.
I set down to do some RAM usage analysis. At first I was going to use Visual Studio's profiler, which I've used for many many years, but it seems they may have removed it, and replaced it with some other thing which I think is useless. Oh well, I think I have a VM with 2010 in it somewhere, I'll look for that later. This is even more disconcerting to me, because I like to do instrumented profiling so that I can see what's really going on and focus my efforts on measurable improvements.
Moving on, I saw that the codebase actually has an ad-hoc allocation profiler module, so I got that working (it won't compile on windows as-is without several mods). The data output was quite an unreadable mess, but I'm sure it was interesting to someone at some time in the past. On the plus side, it generates logs both globally and per-module. I modded the record emission logic to output what I thought was more usable data, and emitted a run.
The results are not promising. As it is, with my test clip, it uses upwards of half-a-meg of RAM! That's just the total of requested memory -- not overhead or consideration for fragmentation, etc. This is a little bit more that the 128k+64k on the chip. So I'm going to need to do a bunch of analysis to see what, if anything, can be done.
Another little treat I found was some memory allocation that happens outside of what is redirected through this debugging module. There's a bunch of memory allocated via alloca(). If you're not familiar with alloca(), it stands for 'allocate automatic', and that means 'off the stack'. alloca() is technically non-portable but virtually everything environment has an implementation. It's non-portable because it allocates memory such that it is automatically free when the function returns, which almost always means 'on the stack'. The upside is that C has long has something vaguely akin to the C++ idiom of ResourceAcquisitionIsInitialization (RAII) which in this case particularly means obviating the need to explicitly free(). Another advantage is that the allocation is 'cheap', because it typically just involves modifying the base pointer register, rather than fiddling with heap allocation structures. A downside for embedded is that stack is often quite limited (hundreds of bytes), so all this code needs to be analyzed.
Now I need to do a bunch of work to see if this runtime memory can be reduced. This could be a while.