Close

The Crash of the Titans

A project log for NYETduinoPlusLua

Wherein a Netduino Plus 2 is repurposed with alternative firmware based on Lua

ziggurat29ziggurat29 01/06/2018 at 19:540 Comments

Summary:

Before getting down to the business of gluing eLua to the System Workbench STM32 HAL libraries, I find some sources of crashes, and fix them.

Deets:

After finally getting a build to complete, I let it run and forthwith wind up in the Hard Fault handler.  Again, this is not great surprise (I'd me more surprised if I /didn't/, since I haven't wired any platform code to do IO).  Stopping in the Hard Fault handler does not admit to a stack trace in the Eclipse tools, alas, so I incrementally zone-in on the faulting line the old fashioned way by doing a coarse depth-first search with breakpoints.  It takes a while, but at length I found the fault to be stimulated by a call to getenv().  This call was being made while loading libraries -- naturally with my luck the last library to be loaded:  'package', where it was trying to get the LUA_PATH and LUA_CPATH env vars.  It has logic to handle the 'variable not found' case, but getenv() itself was crashing prior to that.

getenv() doesn't make much sense in embedded, since there is no OS or shell, but the standard library implementation (newlib-nano in this case) exposes it and that does make porting simpler.  grepping shows that the (e)Lua code has plenty of calls to getenv() throughout, so I look deeper.  The System Workbench does not ship the libc source, alas, which is a real pity, so I look for documentation.

Fun fact:  System Workbench does install (some) library documentation, but it does not link it in the Start Menu, or make it particularly visible.  In my case I found it at:

C:\Ac6\SystemWorkbench\plugins\fr.ac6.mcu.externaltools.arm-none.win32_1.15.0.201708311556\tools\compiler\share\doc\gcc-arm-none-eabi\pdf\libc.pdf

Obviously, right?  Anyway, the documentation mentioned that it requires a global variable 'environ' to work.  Since I successfully linked, I must have the variable somewhere, so that's not it.  I looked for the source via web search and found it:

https://github.com/eblot/newlib/blob/master/newlib/libc/stdlib/getenv.c

However that was not particularly interesting because it was just a wrapper around an internal function '_findenv_r': 

char* _DEFUN (getenv, (name), _CONST char* name) {
  int offset;
  return _findenv_r (_REENT, name, &offset);
}

I may later download the source package, but it will be not as useful as I might like (interactive debugging), since it will not be the version used to make the shipped binary libs anyway.  But much better than nothing.  In the meantime, I was bored with this, and decided to interactively look at this 'enviorn' variable.  I declared it 'extern' and then was able to inspect it via the debugger and see that it exists, and that it points to a single entry of NULL.  I would not think that to be a problem, it's just an empty environment, but I decided to make a new environment list consisting of two entries:  an empty string, and a NULL pointer (to terminate the list).  This worked fine.

I don't know what this means, there's no similar code in the eLua, but it is using a different libc, so maybe this is a bug.  If so, it could have easily have been missed, because who uses 'getenv()' in an embedded context except for ported desktop code (which Lua is).

Happy that I had solved that crash, I let it fly again, and it crashed again.  This time in an fprintf ( stderr, ... ) call.  Again, I'm not too surprised because there has not been any standard objects created, though I would more expect that it would simply direct to the functional equivalent of /dev/null instead of crashing.

Well, after many hours of stepping through assembler (because I don't have libc sources that match the binary libs), I popped out back into user C code!  The eLua code already had overridden the 'bottom edge' to redirect IO to peripheral devices.  Of course, I haven't implemented anything in that area, so no small wonder it crashed.  It's just a pity that so many hours were consumed tracking that down.

Anyway, it's now clear there is some more eLua initialization that needed to be performed.  I added a bunch of that stuff, and now the system does not crash.  It also doesn't do anything useful, since there is no I/O, but at least it seems that the system is running in a consistent state.

So, it's probably time to start working on peripherals.  A good first choice is UART, so we can get some console I/O going...

Next:

Now it's probably time to work on peripherals for real.

Discussions