Close

Streams and Serial and HAL Workarounds, 001

A project log for Careless WSPR

A desultorily executed Weak Signal Propagation Reporter beacon.

ziggurat29ziggurat29 08/06/2019 at 14:570 Comments

Summary

A stream IO abstraction is produced and mated to the serial ports of the system.  Some peculiarities of the STM HAL implementation are worked-around.

Deets

I generally like to abstract serial ports and other sequence-of-bytes-in-and-out into a stream IO interface, rather than call the underlying APIs directly.  Doing so decouples the component that is producing/consuming the data from the implementation of it's source, and so it is easy to redirect the processing implementation to any pipe that implements the conformant interface.

Abstraction

The abstraction I define here is:

#include <stddef.h>
#include <stdint.h>

#define TO_INFINITY 0xffffffff

//These interface objects will typically be in read-only memory

//IO stream abstraction; typically for serial ports
typedef struct
{
	//transmit methods; non-blocking
	void (* _flushTransmit) ( const IOStreamIF* pthis );
	size_t (* _transmitFree) ( const IOStreamIF* pthis );
	size_t (* _transmit) ( const IOStreamIF* pthis, const void* pv, size_t nLen );

	//receive methods; non-blocking
	void (* _flushReceive) ( const IOStreamIF* pthis );
	size_t (* _receiveAvailable) ( const IOStreamIF* pthis );
	size_t (* _receive) ( const IOStreamIF* pthis, void* pv, const size_t nLen );

	//transmit/receive methods; blocking
	//0 on success, nRemaining on timeout (i.e nLen - nProcessed)
	int (* _transmitCompletely) ( const IOStreamIF* pthis, const void* pv, size_t nLen, uint32_t to );
	int (* _receiveCompletely) ( const IOStreamIF* pthis, void* pv, const size_t nLen, uint32_t to );
} IOStreamIF;

This is in the style of C-as-a-better-C++, wherein I manhandle virtual functions and the 'this' pointer.  This project is principally C, but you could obviously redefine this in the C++ way for some added convenience but limiting yourself to C++ usage.

The non-blocking functions are intended to attempt to push in or pull out as much data as possible, but immediately return indicating how much actually was pushed or pulled.  The blocking functions are intended to spin in a loop until all the data provided requested has been satisfied, subject to a timeout.  A special timeout of TO_INFINITY is defined that means wait forever for it to happen.

The _transmitFree() and _receiveAvailable() functions allow one to 'peek' to see if there is any room for sending or if there is anything to receive.

Once a hardware resource is adapted to this interface, then anything that presumes this interface can be mixed-and-matched to any of those hardware resources.  In particular, the upcoming Monitor and GPS tasks will be stream oriented and bound to the USB CDC device and the USART1 device.  This can be extended to other concepts, like a network TCP/IP socket, and I have used it before for custom stuff like an Infrared serial link that demodulates the data stream in software.

Circular Buffers

It's not part of the interface definition, and it's not required, but it is a reasonable assumption that there is some sort of buffer behind the concrete implementations.  I typically use circular buffers for transmit and receive side.  I have a few such implementations which have various trade-offs, but the one I use here looks like this:

//the base type consists of indices, size, and optional debug members
typedef struct circbuff_t circbuff_t;
struct circbuff_t
{
	volatile unsigned int _nIdxRead;
	volatile unsigned int _nLength;
	const unsigned int _nSize;
	const unsigned int _nTypeSize;
#ifdef DEBUG
	volatile unsigned int _nMaxLength;
#endif
};

//the derived type consists of the base type, with the buffer following
#define CIRCBUFTYPE(instance,type,size)	\
typedef struct instance##_circbuff_t instance##_circbuff_t;	\
struct instance##_circbuff_t	\
{	\
volatile circbuff_t _base;	\
volatile uint8_t _abyBuffer[size*sizeof(type)];	\
};

//the instance data is initialized with some critical size params
#define CIRCBUFINST(instance,type,size)	\
instance##_circbuff_t instance =	\
{	\
	{ 0, 0, size, sizeof(type) }	\
};

//you can declare the type and instance in one go, which is probably the usual case
#define CIRCBUF(instance,type,size)	\
CIRCBUFTYPE(instance,type,size)	\
CIRCBUFINST(instance,type,size)

void circbuff_init ( void* pvCirc );
unsigned int circbuff_capacity ( void* pvCirc );
unsigned int circbuff_count ( void* pvCirc );
int circbuff_empty ( void* pvCirc );
int circbuff_full ( void* pvCirc );
int circbuff_enqueue ( void* pvCirc, const void* val );
void circbuff_dequeue ( void* pvCirc, void* val );

This is again in the C-as-a-better-C++ vein (I'm really a C++ programmer).  Here we have some convenience macros to define the data structure in RAM of a certain object type and size of the circular queue, and a few methods to push stuff in and pull stuff out.  What I want to do now is to bind the HAL implementations to the stream abstraction, and use the circular buffers to hold a plurality of pending send or receive data, allowing the producer/consumer to process at their own leisurely pace.  In particular, I like to use interrupt-driven IO to handle those real-time aspects of the hardware into the circular buffers.  But this winds up being problematic with the out-of-box implementations in the STM HAL.  The UART implementation is a little awkward, and the CDC implementation is (in my opinion) deficient.

Binding to HAL UART

The STM HAL libraries generally provide three variations of interfaces to devices:
  1. blocking, where the function does not return until the action has occurred or a timeout has been exceeded
  2. interrupt-driven, where the function immediately returns and the action is performed asynchronously in an ISR
  3. DMA-driven, where the function is performed in specialized hardware.  This is somewhat like having a co-processor.

Using the blocking API is simple, but of course it blocks, and so the CPU is wasted relative to doing other things (even with preemptive multitasking if there is no 'yield' action while spinning).  I still use it for small transfers and for little experiments.

DMA driven IO is the most CPU-friendly, because there is no code needed to perform the transfers.  However, this uses specialized hardware resources, and I don't find it worth it for small or slow transfers (like with the serial ports), so I usually reserve this technique for more timing-critical bulk transfers.

Interrupt-driven IO is halfway in-between in that you ask for something to happen and then later be notified that it has happened.  The notification is rather rude in a sort of "drop what you're doing and handle this right now" sort of way, and what you were otherwise doing might have been pretty important, too, so usually you will want to do as little as possible so you can motor on with your regularly scheduled program.

Serial IO is something that is slow enough that the CPU overhead of servicing interrupts on a byte-by-byte basis is still so cheap that I think it's worth the convenience relative to DMA.  The HAL library provides interrupt-driven calls that look like this:

HAL_StatusTypeDef HAL_UART_Transmit_IT(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size);
HAL_StatusTypeDef HAL_UART_Receive_IT(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size);

Calling either of these will immediately return, and when the buffer provided is either fully transmitted, or fully filled, then an interrupt will be delivered.  It is possible to receive notification of such by implementing either of these callbacks:

void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart);
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart);

GCC supports a concept called a 'weak' symbol which means that you can define a function as 'weak', and then it is OK to have another definition by the same name that will replace it at link time.  In this case, the two callbacks have a default implementation (that does nothing) that is declared 'weak', and you simply have to provide you own (non-weak) definition in your code and your implementation will effectively replace it.

But before you can do any of that, you must first use CubeMX and ensure that the 'global interrupt' is enabled for the USART1 peripheral (that setting is in the properties fly-out under 'NVIC settings').  Failure to do so will let you call those functions all day, but you won't get any interrupts!  Whoops!

This API is simple enough, but not quite all that we need.  Some missing pieces are:

  1. Once a transaction is started, you cannot start another transaction.  So if you had started a transmission of a buffer of data, you cannot queue another to follow it until the first one is done.  You have to pend it somehow yourself.
  2. In the case of receive, the full size of the receive must be known up-front.
  3. Also in the case of receive, you cannot receive an interrupt on reception until you have first requested reception (and of a fixed size amount, as per #2).

To adapt to our stream interface we need to be able to automatically keep feeding the transmission process until all queued data (in the circular buffer) is completed, and also be able to receive arbitrary amounts of data whenever it comes in into our receive buffer.  The HAL functions are just not quite what is needed for that, so one approach would to just not use them at all, and instead implement my own functional equivalents using either the 'Low Level API', or going lower level and doing register level programming directly.  And I probably would do that for a commercial project (and doubtlessly save a bunch of Flash and RAM in the process), but here I'm going for convenience of implementation, so what I did was this:

  1. Send/receive characters on a byte-by-byte basis.  I define an internal 1-byte buffer for send and receive, and setup the transactions with the HAL functions on those buffers.  This might sound unpalatable but ultimately the interrupts come in on each byte, anyway, so it's not that different that what's already happening.  There is just the suboptimal wasted 1-byte buffers.
  2. Create some 'kickstarter' functions that are used to keep a transaction with the HAL open at (mostly) all times.
  3. Use circular buffers to take up the slack between the ISRs and the user code.
  4. Implement the functions required of the Stream interface that will use the above to make it happen.

So, there is a "serial_devices.h" that currently contains:

#include "system_interfaces.h"

//the stream interface objects we expose.
extern const IOStreamIF g_pifUART1;

//these init methods are intended to be called once; they initialize internal
//structures (e.g. queues).  Because of the nature of STM32CubeMX, there is
//also some other init that is done in main.c that is generated code.
void UART1_Init ( void );

//these are optional callbacks that you can implement to catch these events.
//Note, these are generally called at ISR time.
void UART1_DataAvailable ( void );
void UART1_TransmitEmpty ( void );

//these are debug methods for tuning buffer sizes
#ifdef DEBUG
unsigned int UART1_txbuff_max ( void );
unsigned int UART1_rxbuff_max ( void );
#endif

You are expected to call UART1_Init() once to get things wired up correctly.  I am using the start of the Default task to do that sort of thing prior to entering the infinite loop.  Afterwards, you simply call the methods on the stream interface implementation 'g_pifUART1' to send and receive data.  There are the UART1_txbuff_max()/UART1_rxbuff_max() methods that -- much like the stack/heap diagnostic functions -- provide data about the queue usage that can be useful during final product tuning.  At the moment I have both the transmit and receive queues defined as having 128 bytes.  The transmit side will probably fill completely quite often because we can produce data at a high rate.  This is no big deal because the stream interface reports how much could be shoved-in/pulled-out, and the caller can either spin retrying or whatever, or there are convenience methods _transmitCompletely() and _receiveCompletely() that essentially do this for you.  The receive side is the opposite story, though -- it is likely that we will be able to process data nearly as fast as it comes in, so 128 bytes is probably far too big.  But how big is too big?  Well, that's what the diagnostic functions are there to help prove.  I spot check those things along development, but save final tuning to the end of the project.

The implementation of the methods is in "serial_devices.c" and looks a bit like this:

#include "serial_devices.h"
#include "main.h"
#include "stm32f1xx_hal.h"
#include "cmsis_os.h"
#include "util_circbuff2.h"

extern UART_HandleTypeDef huart1;

//UART transmit/receive circular buffers
CIRCBUF(UART1_txbuff,uint8_t,128);
CIRCBUF(UART1_rxbuff,uint8_t,128);

#ifdef DEBUG
unsigned int UART1_txbuff_max ( void )
{
	return circbuff_max ( &UART1_txbuff );
}
unsigned int UART1_rxbuff_max ( void )
{
	return circbuff_max ( &UART1_rxbuff );
}
#endif

static void UART1_flushTtransmit ( const IOStreamIF* pthis );
static size_t UART1_transmitFree ( const IOStreamIF* pthis );
static size_t UART1_transmit ( const IOStreamIF* pthis, const void* pv, size_t nLen );
static void UART1_flushReceive ( const IOStreamIF* pthis );
static size_t UART1_receiveAvailable ( const IOStreamIF* pthis );
static size_t UART1_receive ( const IOStreamIF* pthis, void* pv, const size_t nLen );

static int Serial_transmitCompletely ( const IOStreamIF* pcom, const void* pv, size_t nLen, uint32_t to );
static int Serial_receiveCompletely ( const IOStreamIF* pcom, void* pv, const size_t nLen, uint32_t to );

const IOStreamIF g_pifUART1 = {
	UART1_flushTtransmit,
	UART1_transmitFree,
	UART1_transmit,
	UART1_flushReceive,
	UART1_receiveAvailable,
	UART1_receive,
	Serial_transmitCompletely,
	Serial_receiveCompletely,
	&huart1
};

static volatile uint8_t _byTxNow;	//knumbskull TX buffer for UART1
static void __kickstartTransmitUART1()
{
	//XXX_byTxNow = UART1_txbuff_dequeue();	//
	circbuff_dequeue(&UART1_txbuff,(void*)&_byTxNow);	//
	if(HAL_UART_Transmit_IT(&huart1, (uint8_t*)&_byTxNow, sizeof(_byTxNow)) != HAL_OK)
	{
		//XXX horror; maybe light a lamp
	}
}

volatile uint8_t _byRxNow;	//knumbskull RX buffer for UART1
static void __kickstartReceiveUART1()
{
	//set up to receive more
	//if ( HAL_UART_STATE_BUSY_RX == huart1.State || HAL_UART_STATE_BUSY_TX_RX == huart1.State )	//must grope for RX only state
	if(HAL_UART_Receive_IT(&huart1, (uint8_t*)&_byRxNow, sizeof(_byRxNow)) != HAL_OK)
	{
		//XXX horror; maybe light a lamp
	}
}

__weak void UART1_DataAvailable ( void ){}
__weak void UART1_TransmitEmpty ( void ){}

//A UART has completed transmission.  Push more if we've got it.
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart)
{
	if ( USART1 == huart->Instance )
	{
		int bEmpty;
		UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
		//if there is more in the queue, pluck and transmit
		if ( ! circbuff_empty(&UART1_txbuff) )
		{
			__kickstartTransmitUART1();
			bEmpty = 0;
		}
		else
		{
			bEmpty = 1;
		}
		taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
		if ( bEmpty )
			UART1_TransmitEmpty();	//notify anyone interested
	}
}

//A UART has completed reception.  Stick it in our queue if we can.
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
	if ( USART1 == huart->Instance )
	{
		UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
		if ( ! circbuff_full(&UART1_rxbuff) )
		{
			circbuff_enqueue ( &UART1_rxbuff, (void*)&_byRxNow );
		}
		else
		{
			//XXX horror; buffer overrun
//			LightLamp ( 2000, &g_lltOr, _ledOnWh );
		}
		//either way, set up to receive more
		__kickstartReceiveUART1();
		taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
		UART1_DataAvailable();	//notify anyone interested
	}
}

//UART error
void HAL_UART_ErrorCallback(UART_HandleTypeDef *huart)
{
	if ( USART1 == huart->Instance )
	{
		//XXX III; maybe light a lamp
	}
	else if ( USART2 == huart->Instance )
	{
		//XXX III; maybe light a lamp
	}
}


static void UART1_flushTtransmit ( const IOStreamIF* pthis )
{
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	circbuff_init(&UART1_txbuff);
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
}


static void UART1_flushReceive ( const IOStreamIF* pthis )
{
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	circbuff_init(&UART1_rxbuff);
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
}


static size_t UART1_transmit ( const IOStreamIF* pthis, const void* pv, size_t nLen )
{
	size_t nPushed;
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	size_t nToPush = circbuff_capacity(&UART1_txbuff) - circbuff_count(&UART1_txbuff);	//max you could push
	if ( nLen < nToPush )	//no buffer overruns, please
		nToPush = nLen;
	for ( nPushed = 0; nPushed < nToPush; ++nPushed )
	{
		circbuff_enqueue ( &UART1_txbuff, &((uint8_t*)pv)[nPushed] );
	}
	//if the transmitter is idle, we will need to kickstart it
	//old HAL lib had one state var
	//if ( HAL_UART_STATE_READY == huart1.State ||
	//		HAL_UART_STATE_BUSY_RX == huart1.State
	//	)	//must grope for TX only ready state
	//new HAL lib split state into two vars
	if ( HAL_UART_STATE_READY == huart1.gState )	//must grope for TX only ready state
	{
		__kickstartTransmitUART1();
	}
	else
	{
		//dummy = 0;
	}
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
	return nPushed;
}


static size_t UART1_receive ( const IOStreamIF* pthis, void* pv, const size_t nLen )
{
	size_t nPulled;
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	size_t nToPull = circbuff_count(&UART1_rxbuff);	//max you could pull
	if ( nLen < nToPull )	//no buffer overruns, please
		nToPull = nLen;
	for ( nPulled = 0; nPulled < nToPull; ++nPulled )
	{
		circbuff_dequeue(&UART1_rxbuff, &((uint8_t*)pv)[nPulled]);
	}
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
	return nPulled;
}


//what are the number of bytes available to be read now
static size_t UART1_receiveAvailable ( const IOStreamIF* pthis )
{
	size_t n;
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	n = circbuff_count(&UART1_rxbuff);
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
	return n;
}


//how much can be pushed into the transmitter buffers now
static size_t UART1_transmitFree ( const IOStreamIF* pthis )
{
	size_t n;
	UBaseType_t uxSavedInterruptStatus = taskENTER_CRITICAL_FROM_ISR();	//lock queue
	n = circbuff_capacity(&UART1_txbuff) - circbuff_count(&UART1_txbuff);
	taskEXIT_CRITICAL_FROM_ISR(uxSavedInterruptStatus);	//unlock queue
	return n;
}

The gist is that pushing data into the stream will start a transmit transaction if needed (via the __kickstartTransmitUART1() function), and then Transmit Buffer Empty (TBMT) interrupts will keep any queued data moving out until it's done.  It's perfectly fine to push more data into the transmit queue while stuff is simultaneously moving out.  Similarly, but in reverse, there is always a requested receive transaction kept open (via the __kickstartReceiveUART1() method), and when interrupts finally arrive indicating that a byte has been received, it gets pushed into the queue.  As a convenience, there are notification callbacks UART1_DataAvailable(), UART1_TransmitEmpty() if you really need them, but I generally do not use them at all.  The stream interface is all I typically need.

Next

A similar activity for the USB CDC which requires a bit more work.

Discussions