DMA on the STM32H7

DMA on the STM32H7 is a beast, with each incremental improvement as their hardware got better represented by a different interface.  There's the BDMA, the regular DMA, & finally the MDMA.  The mane one used for accessing GPIOs is the regular DMA.

The mane use of DMA is making a parallel bus by firing data at all 16 lines of a GPIO register & using a timer as the clock.  Despite being a 400Mhz core, bit banging a GPIO only goes at 16Mhz, so you need some kind of hardware support.  There are a few limitations.  The only timers which can drive DMA transfers over GPIOs are TIM1 & TIM8.  Only DMA2 can access the GPIOs.  The most useful information came from:

a complete listing which actually works, once you move the address pointer to AXI RAM & fix all the mistakes he discovered.  The STM32F7 code is interchangeable with the STM32H7.

note about the address pointer.

The TIM_HandleTypeDef has an array of DMA_HandleTypeDefs which cause various timer events to trigger DMA transfers.

FIFOMode must be DMA_FIFOMODE_ENABLE & FIFOThreshold is key to maximizing the bandwidth.  DMA_FIFO_THRESHOLD_1QUARTERFULL gave the best results.

MemBurst only worked with DMA_MBURST_SINGLE.

HAL_DMA_Start is the command which provides the src & dst addresses.  You have to call SCB_CleanInvalidateDCache(); before & after this, since DMA doesn't touch the cache.  The address for a GPIO input is (uint32_t)&(GPIOC->ODR) & for the output is (uint32_t)&(GPIOC->IDR)

__HAL_TIM_ENABLE_DMA is the command which starts the actual data transfer, when using timer triggers.

When using multiple timers to drive clock pins & DMA streams, you have to synchronize the timers.  This is easiest done by setting all the timer_handle.Instance->CNT registers to starting values based on probing with a scope.  All the CNT registers have to be set inside a __disable_irq(); __enable_irq(); block.  Similarly, all the __HAL_TIM_ENABLE_DMA calls need to be with the IRQs disabled.

You must call HAL_DMA_AbortHAL_DMA_DeInit, & HAL_DMA_Init to restart a DMA transfer.

In the STM32H7, GPIO to DMA operations now have to be done in the AXI RAM (0x24000000) or SRAM1, SRAM2, SRAM3 domanes, but not the DTCM-RAM (0x20000000). 

Speed limitations 

 The mane problem is a single DMA stream writing a GPIO from AXI-RAM maxes out at 28.5Mhz.  Any higher & the GPIO stalls every 8 samples.  The DMA doesn't really directly access memory, but uses a FIFO.  The FIFO appears to get starved if the timer fires too fast.  The network analyzer project needs 1 writer DMA stream & 2 reader DMA streams to move 10 bits out & 20 bits in.

Using 3 DMA streams to move 30 GPIO lines, the speed drops to 11.7Mhz & the streams just lock up if they go any faster.  It's disappointing a 400Mhz core has such slow I/O.  The good news is you can copy data to DTCM-RAM (0x20000000) with the CPU & perform calculations without interfering with the DMA transfers.

It should be noted 11.7Mhz is a lot higher than 28.5Mhz / 3, so you can get slightly higher speeds by having more DMA streams in parallel.  There was more speed to be had.

Overclocking the STM32H7

In the 3 DMA stream case of 11.7Mhz, it would be nice to get an even 12Mhz.  You can get a few percent more clockcycles through overclocking....

Read more »