I am trying to replace the malloc() implementation so as to get an idea of memory usage patterns. Along the way, I discovered some interesting things about the internals of libc, eLua, FreeRTOS, and some features of the linker.
When I last wrote, I was having more of the ever-increasingly-familiar hard faults. I was able to improve that by fiddling with some heap and stack size parameters, but I really needed to use a little more rigor into understanding. At that time, I had two major problems:
1) I had no real visibility into actual heap usage or patterns
2) I could not get an answer from (e)Lua as to it's perspective of memory usage
The second wound up being simpler, so I'll explain it first. When I finally got Lua stabilized enough to execute slightly less trivial statements, I am supposed to be able to get the memory usage by issuing something like:
print ( collectgarbage ("count") )
but all I got was a blank line (hey, this is an improvement of the hard fault I was previously getting). I debugged into this a great deal already using my bogo-binary-search approach, but this seemed something else. Then I remembered two things: Newlib Nano specifically excludes floating point specifiers to printf() by default, and numbers in Lua are all double-precision floats. (The newest Lua -- 5.3, to which I ultimately wish to use -- supports integers (and can be made to support 32-bit integers and floats, which I very much wish to do with this hardware), but I'm stuck on 5.1 for the time being.)
Remembering this, I used a linker command line switch to include the float support in printf():
OK! Now I can get some output:
(Those numbers are in kilobytes.) Interesting how it goes up and down. Garbage collection in action. But still not huge enough that I would have run out of my previous 64K heap when I was crashing, so there's more that needs to be understood. What I really want to do is a 'heap walk', so I can see all the blocks allocated to understand better what's happening.
For my first amazing feat, I did some research into replacing malloc() effectively in a newlib project. You'd thing this sort of thing is done all the time, and actually it is. There's a variety of techniques for it.
By far, the most common technique is: don't use malloc() in an embedded system, you are asking for trouble with deterministic runtime behaviour.
Many well-crafted libraries provide a mechanism to customize key features such a memory allocation.
A variety of techniques to coax the compiler/linker into doing what you want to do instead of what the author wanted to do.
I appreciate 'avoidance', and if I totally controlled the code, that's almost certainly what I would do, but this (e)Lua is simply beyond my control and I have to accept that there will be dynamic memory allocation requests.
I did discover whilst tracing through the code that Lua provides a very tidy means of fully controlling the dynamic memory allocation scheme: you simply implement a 'frealloc' function, and set that when you initialize your Lua state via lua_newstate() or lua_setallocf() if you already have a state on-hand. Thanks, Lua! I wonder why that was not used? I will probably explore this later, but for now there's still more to the system than just Lua. Amongst other things, malloc() is used in various internal implementations in libc.
The 'interpositioning' techniques are used to cause the target binary to invoke your own code, instead of the originally intended code. These techniques tend to be compiler and linker specific. In this case, with gcc, there are at least a couple things that can be done:
- the linker will 'prefer' to link the first symbol it finds in the order of object modules and libraries it is specified to use. Indeed, eLua has implementations of various things in newlib/stubs.c that override implementations otherwise in libc. This technique effectively hides the original symbol, so you can no longer call the original function.
- the linker has a --wrap command line switch that allows you to transform the original symbol name and all references to it in the already compiled code. This preserves the original symbol name, so that you can still call it. It's a little like a #define, but for the linker.
I tried out the linker --wrap option. In my build system ('System Workbench for STM32', based on Eclipse) it was actually easier to specify the option indirectly via gcc, since gcc is invoked to do the build step as well. The gcc option to pass something to the linker is '-Wl'. I used it to wrap malloc() like this:
This causes the original malloc() to be renamed to __real_malloc, and all references to malloc() in the compiled code to be renamed to __wrap_malloc, which you are meant to implement somewhere yourself. In this manner, you are able to 'intercept' all method calls, and delegate to the original implementation if you like.
This is probably not sufficient; you also need to wrap malloc()'s friends:
and maybe even
calloc, _malloc_r, _free_r, _realloc_r, __malloc_lock, __malloc_unlock, ...
Yikes! But the facility is there for you if you need it. You should read the libc source so you know what needs to be wrapped. E.g., while _malloc_r seems to be an internal implementation detail, it is actually invoked directly by some other routines, such as vfprintf (and a whole bunch more). The output 'map' file is useful to verify that all pieces have been successfully wrapped out of existence.
I set down to wrap malloc, so that I could forward those into the common memory allocator that FreeRTOS provides (who wants multiple heap spaces?), but that was brought to an abrupt halt because FreeRTOS does not provide a 'realloc'.
I myself have never used realloc() in many (many!) decades of programming, but I suppose I can see it's attraction. Anyway, realloc can be used as a swiss army knife, adn you can malloc, free, and realloc all with the same function. Lua seems to enjoy that approach, as it makes no calls to malloc(), only realloc() (although it does stops short and does call free()). And because of the content-preserving aspect of realloc, you can't really emulate it with a malloc-and-copy, because you don't know the original block size (unless you dig into the implementation details of the arena headers, of course).
Since FreeRTOS does not provide a realloc function, I punted on this until next go-round. I did find some other buried treasure, though.
While thumbing through eLua's newlib stubs.c, I found a routine '_sbrk_r'. This is the routine that malloc() calls when it needs more space added to the heap. This is normally implemented in libc itself, and I'm not exactly sure why it is re-implemented in eLua (I know eLua provides some alternative allocators, but I'm not using them, so I would have expected that code to be conditionally excluded). This method simply raises a high-water mark from a demarcated section of RAM as per some linker provided symbols for where the free heap starts and ends. When malloc needs more heap space, it calls _sbrk_r to add to it.
I decided to add a little code to that function to flood-fill the free ram with a distinctive pattern (0xfd) so that I could see overwritten memory more easily. Also, I made a symbol 'heap_ptr' public. This allowed me to see what was the maximum heap usage of the total program.
I modified main.c slightly so that when you exit the shell, the memory statistics for stack and heap are emitted, then the board is reset so that you start back in the shell. Here are the results from a few runs using some trivial code:
eLua# exit minfreestack: 862, maxheapused: 2032 of 115072 (minfree 113040) resetting...
So, just getting up to the shell prompt used about 2K ram, and 1024-862=162 bytes of stack. Currently, I have about 115K of heap (heap expands to fit unused memory as per linker symbols). So I really never should have had heap problems before, when I was getting all those hard faults. I suspect it might have been due to competing heap implementations. The hard faults seemed to subside when I switched FreeRTOS to be 'static only' memory allocation.
Next, I tried entering-and-exiting eLua
eLua v0.9 Copyright (C) 2007-2013 www.eluaproject.net eLua# lua Press CTRL+Z to exit Lua Lua 5.1.4 Copyright (C) 1994-2011 Lua.org, PUC-Rio > eLua# exit minfreestack: 746, maxheapused: 8496 of 115072 (minfree 106576) resetting...
So eLua itself bumped heap usage up to just over 8K, i.e. about a 6K overhead.
Next, I tried defining a trivial function. This was mostly to test parser overhead; I suspect the function's overhead is trivial once compiled.
eLua v0.9 Copyright (C) 2007-2013 www.eluaproject.net eLua# lua Press CTRL+Z to exit Lua Lua 5.1.4 Copyright (C) 1994-2011 Lua.org, PUC-Rio > function foo ( f ) f() end > eLua# exit minfreestack: 522, maxheapused: 9692 of 115072 (minfree 105380) resetting...
So that caused another 1K or so to be used (probably temporarily).
Next, I tried a slightly more complicated scenario, this time with a function and a loop and a closure.
eLua v0.9 Copyright (C) 2007-2013 www.eluaproject.net eLua# lua Press CTRL+Z to exit Lua Lua 5.1.4 Copyright (C) 1994-2011 Lua.org, PUC-Rio > function foo ( f ) f() end > for i = 1, 100 do foo ( function() end ) end > eLua# exit minfreestack: 492, maxheapused: 13752 of 115072 (minfree 101320) resetting...
So that took about another 4K over the last run. So, in these simple scenarios, Lua is taking up to about 10K to enter, compile, and run this simple code. I don't know how much is parser overhead, though, because I am not set up to precompile the code. That will have to wait a while, because I evidently need to build a special eLua 'cross compiler' to make the bytecode chunk, and I'll need some filesystem support.
But back to memory, I am now going to try to implement 'realloc' in the FreeRTOS memory manager 'heap4.c' and see if I can successfully wrap all the malloc stuff up, directing it to FreeRTOS.
Implement realloc in FreeRTOS heap4.c