♫Mem-ry... Not Enough Is Quite the Bind♫

A project log for The Internet of Nixie Clocks

Wherein the tubes of Nix do bridge betwix the worlds of form and ether. I see no reason one should bridge, but that won't stop me, either.

ziggurat29ziggurat29 01/31/2019 at 15:420 Comments


I set out to do some improvements, but I ran out of RAM.  So I had to make an unexpected improvement in the way of the Lua Flash Store ('LFS') on NodeMCU.


There were several improvements I set out to do, but with each improvement usually comes code, and with code comes RAM.  It turns out I was on the hairy edge of being out-of-RAM as it was, and as soon as I started adding any code, my unit no longer worked due to out-of-memory errors.

When Lua code is deployed to the NodeMCU board, it is usually deployed as source.  It is stored in a filesystem (SPIFFS), and when loaded, it is compiled on-the-fly to bytecode for the virtual machine.  This is the usual arrangement for Lua programs.  Since compiling takes negligible time (and also due to the non-portability of bytecode across different Lua versions !) few folks compile in the conventional desktop arena.

But in the embedded arena, the compiler -- fast though it may be -- does take non-negligible memory resources to execute.  In my case, the program had gotten just big enough that the compiler would run out of memory before finishing.

The next line-of-defense in this situation would be to break up the program into multiple files, and compile them on-device into pre-compiled images (usually with the '.lc' filename extension, but that is not required).  Making multiple shorter files reduces the memory footprint the compiler needs to process a single translation unit.  In a way, this strategy is a human-guided version of a divide-and-conquer tactic.

This will carry you a long way, but it does mean doing some surgery to your code, and ultimately it will only carry you so far.  And it still uses RAM to load the pre-compiled bits off the filesystem into working memory.

There is also another feature of the NodeMCU environment that can be used:  the 'Lua Flash Store' ('LFS').  This is sort-of like a filesystem, but not quite, and it holds pre-compiled Lua objects.  These pre-compiled objects have at least two benefits:  1)  they are execute-in-place.  2)  they can contain read-only strings and read-only tables.

The execute-in-place feature means that you don't have to load the pre-compiled bytecode into RAM to execute it, you can run it directly from where is sits:  in flash.  Also, putting read-only objects like strings in flash is a big help too.  Lua uses looots of strings, and in non-obvious places.  Your function names are a string.  When you call a function, that is a string lookup.  Your member names of structures are strings and involve string lookups.  The Lua runtime goes to great pains to 'intern' these strings, and avoid duplications, but when you've only got 40K of RAM, that stuff still adds up.

Using the LFS involves more work than fragmenting the code and precompiling chunks, so naturally I chose the more difficult route.  The first exciting difficulty is creating the needed tools!


The standard Lua has always included 'luac', which is a tool that just runs the compiler on your Lua source, and dumps out the byte-code to a file, rather than running it.  However, the NodeMCU project uses a modified Lua runtime that allows for objects in read-only memory, and this requires a special build of the 'luac' (called 'luac.cross') that is cognizant of these things.  Additionally, luac.cross packages the result into a form that the runtime can directly 'mount' into the execution environment.  This form is the 'LFS image'.

For some reason, NodeMCU does not publish built binaries, so if you want to play with LFS you will need to be building from the source.  Also, NodeMCU is very much a Linux/gcc-oriented project, so I was left more-or-less out-in-the-cold on my Windows platform.  [Edit:  I later found out that I was not so out-in-the-cold, but I didn't know that until after I had done the things I will now describe.]  So, for my first amazing feat, I would need to see how much hacking on the source I would need to do to get the luac.cross to build for Windows.

I have some experience with Lua in general from desktop projects, so I knew that building Luac /in principle/ conceptually should not be too bad.  As a compiler, the dependencies on esoteric stuff like board hardware should be negligible, and as a desktop application, dependencies on the runtime should be easy to satisfy.  However as a Linux/gcc application, porting the project to Visual Studio could be a challenge.

The first thing to do was to gather all the relevant source.  The NodeMCU is a makefile based system, but fortunately the luac.cross project does not depend on heirarchical makefiles, so it was fairly easy to gather the required files.  They were all in 'app/lua' 'app/include', and 'app/uzlib'.  The uzlib is used in compressing the final LFS flash image.

I set up a MSVC project with all the relevant source, and started building and fixing reported errors.  There were a couple overt bugs that were clearly masked on Linux builds which were easy to fix.  The main challenge wound up being translation of compiler-specific directives, such as gcc '__attribute__' into functional equivalents for MSVC '#pragma' and 'declspec'.  Some were for data alignment specifications, and others were for controlling the placement of certain objects when linking.  Alignment was straightforward, but controlling placement of objects in named sections is a little more tricky.  The section placement is important because other code tests for whether an object is read-only based upon its section placement.  This affects the compiler output.  Fortunately for me, I have a lot of experience with MSVC, so I was able to create the required declspecs without having to study the manual.

A Fool's Errand

Naturally, after having got the project building and feeling somewhat proud of myself for having done so, further study of the process of using LFS revealed that someone made a web service that will do the cross compile for you, so you don't really need the tool at all!

You can just zip up your source, submit it to the web service, and get back the compiled image.

Also, it further appears that the NodeMCU team got a Cygwin build working, so you actually can build a native Windows app that way.  Oh, well.  One good thing did come of my exercise, however:  I found a rather nasty bug.  I submitted it to the team as bug #2632 -- we'll see if/when they fix it.

A Bug

The nature of the bug is a use-of-memory-after-having-been-freed bug.  I noticed that sometimes my running of the compiler would generate no output.  Moreover, when it generates no output, it seems to take a little longer than usual.  It's veritably instantaneous in the positive case, but takes a couple seconds in the negative case.  And it wasn't deterministic.  It was a moody problem.  But, since I had the project building in MSVC, it was really easy for me to run it in the debugger.

Turns out the application was crashing, but since I was running a 'release' build, the crash simply terminated the program rather than doing something more 'interesting'.  When running the debug build, it was very clear that a chunk of memory that was freed, was continued to be accessed afterwards.  So on a good day that memory would not have been re-used for something else yet, and the desired data would still be there for use.  On a bad day that data would be corrupt, and who knows what would happen.

In MSVC, if you build a 'debug' build, the memory allocator will fill blocks with a pattern to make it easier to see what state it is in.  Freshly malloc'ed data will be filled with 0xCD (presumably 'clear data'), and free'ed data blocks will be filled with 0xDD (presumably 'dead data').  Arena headers have some guard bytes 0xFD (presumably 'forbidden data').  The block filling of 0xDD turned the non-deterministic behaviour into deterministic behaviour.

The code at the crash site was not so obvious as to what was freeing the data, but rummaging through the source at all the malloc/free, and surrounding code revealed a case of 'pointer aliasing', and that when the actual block was freed (through a different pointer), then this other pointer is left dangling.

The solution was simple:  just move the free to a few lines later, after the final access to the memory block was finished.

Making LFS

Once I could make images for LFS, it was time to kick the tires.  There is a little bit of hand motion involved:  you copy the LFS to the SPIFFS, and then tell the board to flash the file into it's special spot.  Once it does this, it will reset the board, and henceforth the code and objects in the LFS will be automatically 'mounted' for use.  But first you need to have a special spot to put them in.

The fundamental way of doing this is to build the firmware with some configured options.  Fortunately, the web-based firmware builder has since been augmented with some options for including a LFS region, so you just need to specify that appropriately when you create your firmware.  In my case, I specified a 128 K LFS regions (apparently much more space than I really need, but this is a 4 M flash part, so what else am I going to do with the space?).  Apparently it is also considered wise to specify a 1 M offset to the SPIFFS region.  This allows you to create new firmwares with reduced likelihood that the new image will damage an existing SPFFS on your device.  If you don't specify the offset, then the SPIFFS will immediately follow the flash image, and thus move around build-to-build.

Once I built the firmware with the LFS support, I flashed it.  Nothing terribly exciting here except for an additional boot message indicating no LFS was found.

Next, I needed to make a LFS image of my code.  As per recommendation, I added a couple stock files '_init.lua' and 'dummy_strings.lua', and I added a third party module 'inspect.lua' and the start of my own code 'kosbo_lfs.lua'.  I compiled them into a LFS image with this incantation:

luac-cross.exe -f -m 0x20000 -o lfs.img *.lua

The '-f' option tells the compiler to create the flash image (instead of just a Lua binary chunk), and the -m 0x20000 option tells it to sanity check that it will not overflow my 128 K area.  It will be a long time before I'm in danger of that, but I wanted to put all that in a batch file for general use.  The '-o' option specifies the output filename, and the rest are the various Lua source to process.  The 'stock' source modules of '_init.lua' and 'dummy_strings.lua' need a little explanation.

'_init.lua' and 'dummy_strings.lua'

'_init.lua' is notionally similar to the actual 'init.lua' on the SPIFFS filesystem, but there is no magic in the name.  You have to explicitly call it yourself; presumably from an 'init.lua'.  What it does is make the LFS easier to use.  It creates an LFS table that can reference the stuff in the LFS without having to go through the explicit node.flashindex() API call.  Lua supports a notion called 'metatables' which allow you to override operations such as indexing, and this code uses that to create a table 'LFS' (which takes some RAM) and override the index operation (which does /not/ take RAM) to find things in the flash store.

This is handy, but the big benefit (in my opinion) is the manipulation of the 'searchers' and replacement of 'loadfile' and 'dofile' to first check in SPIFFS, and then check in LFS.  This means you can transparently use objects in the LFS and not really have to care about the special node.flashindex() mechanism.  Additionally, your SPIFFS hosted source overrides the LFS source, so it is handy for development.  Develop more rapidly in SPIFFS, and then incrementally move debugged source to LFS.

'dummy_strings.lua' is kind of cool in that the module does nothing (I guess hence 'dummy').  but it declares a boatload of strings.  In normal Lua, this would simply allocate a lot of strings, then they would be immediate candidates for garbage collection -- not really exciting.  However, since these strings are in the LFS, they are read-only strings that are /never/ garbage collected.  And because Lua 'interns' strings, then any time your source specifies a string that matches, it will actually just create a reference to these ROM-based strings, rather that allocate RAM to store them.  Their mere presence in this file is sufficient to make this happen, so it's a kind of free magic.

You do have to figure out what strings to put there, but there is a handy code snippet in the comments that, when run, will dump all the strings that are currently stored in RAM, and handily formatted such that you can cut/paste the output into this file.  Handy!

Burning LFS

OK, getting back to the LFS image.  To use it you simply copy it onto SPIFFS, and then utter these magic words:


(obviously specifying whatever filename in your case).  This will cause the image to be validated, burned to LFS, and then the system reboots, and the message about 'no LFS found' will /not/ be emitted, because it was found!  Once burned to the LFS region, the image file serves no further purpose, so you can delete it if you want, but I usually just leave it there because there is so much space on the SPIFFS as it is.

This flash operation is a manual one, and you only need to do it once for a new image.

Using LFS

Once your stuff is in LFS, you can access it directly via the node.flashindex() mechanism, but it is much nicer to use the features of '_init.lua', so typically your first action on bootup is to do:


Which will wire in all that stuff we talked about before.  Then you just make function calls as per usual.  As such, my development process is to develop in the RAM intensive 'kosbo.lua' as before, but as functions become mature, I move them over to 'kosbo_lfs.lua' and burn them into LFS.  Currently, this was all the timezone code and the clock control code.  My kosbo.lua contains newly developed code.

I also kept my config file in SPIFFS, because I want that to be read/write, and it doesn't take RAM since it's contents are discarded after having been processed, and it's processed to completion right after boot, before more significant code is run.

Now that detour is completed, it's time to get back to the code improvements.


Getting back to various improvements.