Close

Reimplement Realloc Redux

A project log for NYETduinoPlusLua

Wherein a Netduino Plus 2 is repurposed with alternative firmware based on Lua

ziggurat29ziggurat29 01/18/2018 at 03:030 Comments

Summary:

I have implemented 'realloc' in the FreeRTOS 'heap_4.c' implementation, along with some other debugging features.

Deets:

We left off with my mentioning a way to intercept and redirect function calls at link time (on gcc, using 'wrappers'), but that there was a functional gap in that the FreeRTOS heap implementation does not have a 'realloc' function.  I set down to implement 'realloc', and I can report that I have emerged triumphant.

I started with the 'heap_4.c' implementation.  In this implementation, the heap is realized as a series of blocks aligned to a defined boundary size (in this case '8 bytes').  The blocks have a short header consisting of a pointer to the next block, and a size of the block itself including header.  The size field furthermore uses the most-significant bit to indicate 'allocated' or not.  If a block is /not/ allocated, it will be part of a list linked by the header's 'next block' pointer.  If it /is/ allocated, the header pointer will be considered invalid (it will actually be NULL-ed), and the size will have the most-significant bit set.  So, alloc()'ed blocks are not linked, but free blocks are, and in address order.  Free'ing a block will return it to the linked list in the proper place, and merging of adjacent blocks is performed.

I implemented a 'pvPortRealloc' function that defers to the existing pvPortMalloc and pvPortFree for the degenerate cases, and otherwise tries to satisfy a resize request by nibbling away from the next block, if it is free, and if there is enough space.  If there isn't, it will try to do a malloc-copy-free sequence.  One implementation detail of 'realloc' is that if it fails, it does /not/ free the original block; callers must be aware of that.

I created some unit tests and verified the code to my satisfaction.  In the process of satisfying myself, I discovered a couple things:

  1. the heap is created from a static chunk of RAM named 'ucHeap'.  FreeRTOS will defined for you, but there is an option 'configAPPLICATION_ALLOCATED_HEAP' that will suppress that, in which case you are meant to define it yourself.  This lets you place the memory in other regions if you like.  One thing I noticed is that the default definition is not suitably aligned (8 byte), and consequently four bytes were wasted at the beginning and end.  I want my 8 bytes back!
  2. heap memory is not initialized in malloc, of course.  This made visual debugging of the heap a bit tedious, since it was not readily obvious what parts were allocated/freed/written to.

For the fist thing, I used the configAPPLICATION_ALLOCATED_HEAP option and a gcc-specific declarator '__attribute__((aligned(8)))' on my own ucHeap, and was able to get the whole heap aligned correctly.  Ostensibly, the linker would also stick some other variables in the space 'recovered' this way to be efficient, though I didn't verify.

For the second thing, I added a new feature whereby the heap memory was filled with distinctive patterns under certain conditions:

These fill patterns are only done if a switch 'configMALLOC_FILL' is defined to 1, so it's a completely optional debugging feature.  Now I can visually see at a glance where blocks are bounded, what is allocated and cleared, and what has never been used at all.  It's kind of pretty!  It's so much easier to visual inspect the heap in the debugger than it is to manually go through the bytes with a calculator to see what is and isn't used.  This leads to the next thing...

Armed with my newfound heap internals knowledge, I implemented another debugging tool:  a heap walker.  The gist is that you invoke vPortHeapWalk() with a callback function of your implementation, and it is called for each block on the heap, with the block pointer, the size, and a flag indicating if it allocated or freed.  So now if you wanted to do a heap walk to find non-freed blocks, you can do that easily.  You could also see how much heap overhead is being consumed in header blocks, etc.

Because heap corruption is a real problem when developing code, also I added some checks to detect that.  The checks can't be comprehensive, but they will detect the pathological cases which would cause the heapwalk to go gonzo, and most heap abuses would cause that, so it is a pretty good check.  The short story is that they check for block header overwrites that cause an insane condition to occur, of which there are presently four:
  1. block pointer is to non-aligned block
  2. block pointer is to an area outside of the heap memory chunk
  3. sequence of blocks is not monotonically increasing
  4. the allocation flag is not set as expected

The heap walk will return '0' if the heap seems OK, and one of those symptom code once it has found a block that exhibits the detected problem.

I was just about to contribute the code to the FreeRTOS community when I noticed that there is another heap implementation 'heap_5.c' that works like 'heap_4.c', but supports multiple discontiguous heap blocks.  *sigh*; so I want to port my mods into heap_5 also, and submit both.  I don't have to -- free code is free code -- but it does seem like I should for completeness.

I also had an idea for another minor improvement, which is to pattern fill as 0xcd only the memory that was explicitly requested, and 0xdd fill the extra padding for the block.  This is conceptually straightforward, but I will have to hack the pvPortMalloc implementation, because it modifies it's parameters, and the original size request is gone by the time I do the fill.  I wanted to avoid hacking the existing code, but I don't have a choice in this case.  But it will be surgical, and anyway it will have no impact if you don't have the pattern fill debug feature enabled.

I used the wrapper feature I described before, and redirected malloc() and friends into the FreeRTOS implementation, and things seem stable.  I learned something interesting, though:

So...  Until I knew this, I was wrapping malloc(), free(), realloc(), and things seemed to be working from a unit test standpoint, but under normal operations it was still a trip to hardfault land.  Last time I wrote about having found the internal sbrk function, which at that time I used to gain some visibility into heap usage, but now I was able to use it to verify that, yes, these alternative heap management functions were still being used, and side-stepping my heap.  So, I had to wrap them, too.

My wrappers spec list wound up being:

-Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,realloc -Wl,--wrap,_malloc_r -Wl,--wrap,_free_r -Wl,--wrap,_realloc_r 

And I might have to wrap calloc() and _calloc_r(), too, but I haven't verified.  Nothing is using those, but be aware that they might need it, unless they are implemented with a call malloc() followed by a memset.

NOW at last things seem stable again.

It's worth mentioning that there is a featurette in FreeRTOS that is activated by configUSE_NEWLIB_REENTRANT which causes FreeRTOS to include a 'struct _reent* r' for each of FreeRTOS' 'tasks', and to appropriate initialize and free that at the correct time.  If you use newlib, you'll probably want to activate that feature, because I'm telling you:  there's plenty of hidden use of the "reent" versions of the functions in the library implementation.

For those that don't already know, the ancient standard C library has a bunch of global state.  That venerable library was created before we had threads, and many Unixians were vehemently opposed to including threads in POSIX.  But in the end, the case for threads prevailed, so then what to do about things like errno, strtok, etc.?  Newlib does it by wrapping all that state into a struct, 'struct _reent* r', that is meant to be per-thread, and there is a global pointer, _impure_ptr that points to the state for the current one.  The '_r' versions take an explicit state reference, and the normal unadorned versions use the global pointer.

The state is pretty big; even for newlib 'nano' -- which puts a premium on size -- the size of that struct is just over 1K, so you'll incur that for each task you create.

Anyway, now things are stabilized once again, it is time to get back to the real business of implementing features.

Next:

I implement features.  I think I'm going to start with an improved serial I/O.  Currently, polling is used, but I'd rather at least some interrupt driven buffered stream.

Discussions