To reach 10TOPS, CUDA is required.

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 08/20/2021 at 00:280 Comments

I broke the bank when I found a 2nd hand laptop with integrated RTX3070. The expected speedup could be around 2000 ! That brings the w32 computation in the 6 months ballpark.

Already, going from a 2nd Ger. 2-core/4-thread i7-2620M to a 10th gen i7-10750H  with 6 cores that can run even faster, this is a huge improvement. Having 12 threads at my disposal directly triples my usual throughput.

Unfortunately the RAM didn't scale so well, I'm still stuck to 16GB but it is claimed to be upgradeable well beyond that. When I have money. And DDR4 should be a bit faster than DDR3.

The dual SSD in RAID-0 is a nice and welcome touch. This will help a LOT for FPGA synthesis. The 12MB L3 cache will also help on that application.

The king of the show, though, is the onboard RTX 3070 with 8GB of dedicated GDDR6. Its power totally dwarves everything else in my proximity, in a sleek laptop format...

The 5888 CUDA cores run at 1.7GHz, letting me expect a peak 10 tera adds per second. That's about 2^43/s. For comparison, the bidirectional scan can compute 630M loops per second, in the order of 2^29. That's a jump of 2^14, though I have no working code at the moment but it gives a good idea of the insane speedup.

From there, I don't see how I'll be able to get even better performance. To even double that, I'd have to buy another similar system. That's not interesting. Maybe rent a bitcoin mining rig or something like that. But for now, this totally blows the other approaches and I'm going to shelf the FPGA and RPi projects for now. The PolarFire FPGA is at least 10 times slower and could swallow w26 in maybe 6 hours, while the RTX will be finished in 20 to 30 minutes.

The most significant factor for performance will be the efficiency of the code. I'll have to dig into the CUDA specs and tools. That will make the difference between 2 and 3 months of computation.

The problem will be the operating system. I don't know if/how I'll be able to dual-boot it. I need the W10 to run the FPGA/EDA tools but I also need a bare Linux for the rest.