Shilly-shally with the Shell by the Seashore

A project log for NYETduinoPlusLua

Wherein a Netduino Plus 2 is repurposed with alternative firmware based on Lua

ziggurat29ziggurat29 01/08/2018 at 03:130 Comments


I first try to get a UART up to provide some I/O for the shell.  Half seems to be working.


For my first step, I am going to simply get a serial port up with the System Workbench HAL drivers, and attempt to wire that into the eLua.  This should give me a serial console, if it all works out.  This will be an interim implementation, though, because the selection of those pins to that port will be hard coded int the firmware.  I believe the ultimate goal is that the pin configurations will be assigned at runtime, under Lua control.  Nevertheless, the experience should help guide me in how to do that, and anyway I really need some I/O now.

As a simplification for now, I simply configure STM32CubeMX to designate the (user-visible) D0 and D1 pins to be UART -- specifically USART6 (because that's how the board is wired) and emit init code for that.  Then I will tweak the eLua board def to emit a header indicating that eLua's notion of 'UART 0' will be the serial console.

This is not how it will ultimately work.  I'm expecting that in the end, all the pins will come up in tristate or analog mode, and then the Lua application code will 'open' the various devices, which will cause the pins to be configured at runtime in the associated manner.  I.e., D0 and D1 would be useable either as UART or digital IO as per application; not hard-coded to be serial as I'm doing now.  But I've got so much more to learn and do before I can implement that fanciness.

Anyway, the STM32CubeMX part is trivial.  Now I have to wire it in.

I took several quick trips to Hard Fault land, and after spending many, many, hours (days?) with my 'binary search' approach to finding the offending code, I decided to put forth a little effort towards getting more info in those cases.  Hard faults on the ARM cause context to be dump to the stack before vectoring to the handler, but Eclipse does not know how to present that information, so you don't see the usual stack trace if you set a breakpoint in the handler.  I found some code on FreeRTOS's site for some simple information gathering, but it didn't work as-is.  It involves inline-assembler, and I guess my compiler (gcc) is slightly different than whatever they were using (which I would have thought would be gcc, but whatever).  My problem was extremely simple:  I couldn't branch to a subroutine via a register -- the address loaded was always incorrect.  I did manually change the register to the correct address, and verified the other stuff worked as expected.  After yanking-and-twisting for a while, I did get the handler working, so I post it here for posterity:

void prvGetRegistersFromStack( uint32_t *pulFaultStackAddress );

__attribute__( ( naked ) ) void HardFault_Handler(void)
    /* USER CODE BEGIN HardFault_IRQn 0 */
    //XXX there needs to be __attribute__( ( naked ) ) void HardFault_Handler(void)
    //XXX but the code generator will probably overwrite that.  Verify that has not
    //XXX been lopped-off before proceeding, lest your stack references be off.

    __asm volatile //'volatile' to prevent gcc from rearranging them
        " tst lr, #4                        \n" //test EXC_RETURN number in LR b2
        " ite eq                            \n" //if zero then
        " mrseq r0, msp                     \n" //Main Stack, put MSP in R0
        " mrsne r0, psp                     \n" //Process Stack, put PSP in R0
        " ldr r1, [r0, #24]                 \n" //get fault stack address
        " ldr r2, =prvGetRegistersFromStack \n" //call through reg to avoid messing
        " bx r2                             \n" //with params already in regs

    /* USER CODE END HardFault_IRQn 0 */
    while (1)
    /* USER CODE BEGIN HardFault_IRQn 1 */

    /* USER CODE END HardFault_IRQn 1 */

void prvGetRegistersFromStack( uint32_t *pulFaultStackAddress )
    //'volatile' to avoid the compiler/linker optimising them away
    volatile uint32_t r0;
    volatile uint32_t r1;
    volatile uint32_t r2;
    volatile uint32_t r3;
    volatile uint32_t r12;
    volatile uint32_t lr; // Link register.
    volatile uint32_t pc; // Program counter.
    volatile uint32_t psr; // Program status register.

    r0 = pulFaultStackAddress[ 0 ];
    r1 = pulFaultStackAddress[ 1 ];
    r2 = pulFaultStackAddress[ 2 ];
    r3 = pulFaultStackAddress[ 3 ];

    r12 = pulFaultStackAddress[ 4 ];
    lr = pulFaultStackAddress[ 5 ];
    pc = pulFaultStackAddress[ 6 ];
    psr = pulFaultStackAddress[ 7 ];

    //When the following line is hit, the variables contain the register values.
    volatile int n = 0;
    for( ;; )

some related info:

It's still a little tedious to debug, but far less so when you know the fault address, because you can at least then lookup the function via the generated map file.

I still had a bunch of faults happening, seemingly non-deterministically, so I increased the stack size for starters.  This didn't seem to help, so I turned on some features in FreeRTOS that help to determine stack usage, which was quite low.  Then, the hard faults stopped happening altogether, which rather put a damper in debugging.  Non-deterministic bugs are so disheartening!

Without being in a position to debug anymore, I decided to motor on.  I found a spot where uart IO ostensibly occurs, and simply implemented it using the 'polling' HAL calls.  These do IO in the most trivial way:  polling on some sort of 'busy' flag for every byte send or received.  I don't intend to do it this way in production, but this is just a quicky for sanity checking.

After hooking up a trusty FTDI, I get my first results:

Ta-da!  But the joy was short-lived; it does not seem that I can type into the system.  I don't know why this is, but I'm guessing that I'm simply clueless as to how the eLua expects platform adaption to be down.  I do a bunch of the usual single-stepping of code, and it seems that the system is running correctly, but is expecting to push data directly out of the port (and that is obviously working), but pull data from a buffer, instead of directly from the port.  So I guess the write side is write-through, whereas the read side is buffered, with the expectation that there is some separate process (or interrupts) that produce data into that buffer.

I still like the eLua, and I'm especially glad that they have produced modules for the various common peripherals, but the platform abstraction so far does not seem to afford quite as clear a separation as I would like.  But maybe I'm just ignorant -- that is certainly possible.  It has two components:  a 'cpu' component (under platform), and a 'board' component (that seems to be the specs for an automatically generated header) under 'boards'.  All that sounds great and conventional, but if I'm having to fiddle with aspects of the common stuff, then that means the separation has blurred a bit, and some carnal details of the platform-specific implementation are exposed higher up.

E.g. interrupts:  there's a few places where the system wants to disable interrupts (presumably globally).  In the few places I analyzed this, it was to create a critical section around a variable that was being modded.  That code presumes that you can globally disable interrupts, and that the disable method is able to tell you what the state was before making the change.  This is actually inconvenient for me, because FreeRTOS does /not/ provide a means to tell you if they were or were not disabled prior to making a call to taskDISABLE_INTERRUPTS, or taskENTER_CRITICAL, and even taskENTER_CRITICAL_FROM_ISR returns something non-portable and which I'm not yet convinced can be used here safely anyway.

Ultimately, I think I'm going have to little higher into the eLua system.  It doesn't need to know about interrupts (for the shell part, at least), or buffering techniques, etc.  It should simply require a data stream that can read() and write() from whatever source, and leave those implementation details to the lower levels.  But when I get to implementing the peripheral modules, I'll probably have some very different thoughts, but this shell I/O stuff should be abstracted a little be higher up.

But in the near-term, I really need to get my receive side of the uart for the shell working.  Then I'll be cooking with Crisco.


Need to get the receive side of the UART attached to stdin working.