The PolarFire way

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 07/22/2021 at 14:510 Comments

Digging further into the MPF300, I get a better taste of its potential as number cruncher. Summary: it tastes good with an estimated run time for w26 of about 5 hours. That's a direct speedup of about 200 and I'm not even pushing it much.

The key to this is the 924 "Math Blocks" that can run at about 500MHz, providing more than 400 billion additions per second. Not bad for a 500$ board, that's about 1Gadd/s/$. I found an eBay auction that was even sweeter. It would take much more work to get to that level with a cluster of RPi, after the whole HW and SW are operational AND the GPUs harnessed.

The 924 "Math Blocks" are described in some detail in PolarFire_FPGA_Fabric_UG0680_V7.pdf (PDF) and some very interesting details make them attractive, for example by including the necessary DFF buffers, the carry signals and an ADD width of 48 bits. This means that these blocks are appropriate for both the Add/accumulator datapath AND the cycle incrementer. What is missing is the zero detection, which normally requires the use of external logic, probably going through embedded carry logic of the LUT. This part might decrease the speed.

Speed matters and the delay of the Math block is certainly different from the native LUT logic, which is still plentiful. When using the Math block, DFF and Add logic are already implemented, some glue is still required for read/write but this does not fill the 300K LUT4, the rest of which can also implement a slightly slower and discrete version. That could bring the total number of parallel units in the 1500 range, eventually 2000 for small w. This provides a significant speedup with reasonable effort...

The big roadblock for now is how to control, write and read all the registers. And LUT4s are not practical for the implementation of wide MUXes with low latency. And it seems that the RAM blocks of the PolarFire family don't have a FIFO control hard block...