Close

Of the difficulty to microbenchmark on modern processors

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 12/29/2021 at 14:430 Comments

I might have to reconsider some of my previous measurements, as I find the following results when I run the exact same program several times:

[yg@localhost pscan]$ /usr/bin/time ./pscan 
Scanning w18 from X=0 (included) to 262144 (excluded)
58.04user 0.01system 0:14.93elapsed 388%CPU (0avgtext+0avgdata 4464maxresident)k

[yg@localhost pscan]$ /usr/bin/time ./pscan 
Scanning w18 from X=0 (included) to 262144 (excluded)
61.71user 0.01system 0:15.83elapsed 389%CPU (0avgtext+0avgdata 4548maxresident)k

[yg@localhost pscan]$ /usr/bin/time ./pscan 
Scanning w18 from X=0 (included) to 262144 (excluded)
67.48user 0.01system 0:17.24elapsed 391%CPU (0avgtext+0avgdata 4568maxresident)k

[yg@localhost pscan]$ /usr/bin/time ./pscan 
Scanning w18 from X=0 (included) to 262144 (excluded)
66.19user 0.02system 0:19.22elapsed 344%CPU (0avgtext+0avgdata 4464maxresident)k

The fan starts to spin, faster and faster, then I wait a few minutes:

[yg@localhost pscan]$ /usr/bin/time ./pscan 
Scanning w18 from X=0 (included) to 262144 (excluded)
57.84user 0.01system 0:14.63elapsed 395%CPU (0avgtext+0avgdata 4436maxresident)k

I agree that the running conditions were not perfect but kernel logs show throttling events... This might explain why some of my previous benchmarks and micro-optimisations had "some variance", although I used w16 on a single thread and it only ran for about 3s or so.

Developing for modern machines is so complicated... I'll have to decide if I benchmark on "hot" or "cold" processors, for example.

Discussions