DMA on the STM32H7
DMA on the STM32H7 is a beast, with each incremental improvement as their hardware got better represented by a different interface. There's the BDMA, the regular DMA, & finally the MDMA. The mane one used for accessing GPIOs is the regular DMA.
The mane use of DMA is making a parallel bus by firing data at all 16 lines of a GPIO register & using a timer as the clock. Despite being a 400Mhz core, bit banging a GPIO only goes at 16Mhz, so you need some kind of hardware support. There are a few limitations. The only timers which can drive DMA transfers over GPIOs are TIM1 & TIM8. Only DMA2 can access the GPIOs. The most useful information came from:
a complete listing which actually works, once you move the address pointer to AXI RAM & fix all the mistakes he discovered. The STM32F7 code is interchangeable with the STM32H7.
note about the address pointer.
The TIM_HandleTypeDef has an array of DMA_HandleTypeDefs which cause various timer events to trigger DMA transfers.
FIFOMode must be DMA_FIFOMODE_ENABLE & FIFOThreshold is key to maximizing the bandwidth. DMA_FIFO_THRESHOLD_1QUARTERFULL gave the best results.
MemBurst only worked with DMA_MBURST_SINGLE.
HAL_DMA_Start is the command which provides the src & dst addresses. You have to call SCB_CleanInvalidateDCache(); before & after this, since DMA doesn't touch the cache. The address for a GPIO input is (uint32_t)&(GPIOC->ODR) & for the output is (uint32_t)&(GPIOC->IDR)
__HAL_TIM_ENABLE_DMA is the command which starts the actual data transfer, when using timer triggers.
When using multiple timers to drive clock pins & DMA streams, you have to synchronize the timers. This is easiest done by setting all the timer_handle.Instance->CNT registers to starting values based on probing with a scope. All the CNT registers have to be set inside a __disable_irq(); __enable_irq(); block. Similarly, all the __HAL_TIM_ENABLE_DMA calls need to be with the IRQs disabled.
You must call HAL_DMA_Abort, HAL_DMA_DeInit, & HAL_DMA_Init to restart a DMA transfer.
In the STM32H7, GPIO to DMA operations now have to be done in the AXI RAM (0x24000000) or SRAM1, SRAM2, SRAM3 domanes, but not the DTCM-RAM (0x20000000).
The mane problem is a single DMA stream writing a GPIO from AXI-RAM maxes out at 28.5Mhz. Any higher & the GPIO stalls every 8 samples. The DMA doesn't really directly access memory, but uses a FIFO. The FIFO appears to get starved if the timer fires too fast. The network analyzer project needs 1 writer DMA stream & 2 reader DMA streams to move 10 bits out & 20 bits in.
Using 3 DMA streams to move 30 GPIO lines, the speed drops to 11.7Mhz & the streams just lock up if they go any faster. It's disappointing a 400Mhz core has such slow I/O. The good news is you can copy data to DTCM-RAM (0x20000000) with the CPU & perform calculations without interfering with the DMA transfers.
It should be noted 11.7Mhz is a lot higher than 28.5Mhz / 3, so you can get slightly higher speeds by having more DMA streams in parallel. There was more speed to be had.
Overclocking the STM32H7
In the 3 DMA stream case of 11.7Mhz, it would be nice to get an even 12Mhz. You can get a few percent more clockcycles through overclocking....Read more »