I added some statistics calculations into RV32I[MA] emulator ( originally created by Fabrice Bellard and modified and shared on Hackaday by @Frank Buss ) and collected stats from some RISC-V benchmark tests (see https://github.com/riscv/riscv-tests/tree/master/benchmarks). With DEBUG_EXTRA option it collects this info from Dhrystone benchmark for example:
Instructions Stat:
LUI = 892
AUIPC = 7716
JAL = 11212
JALR = 12850
BEQ = 33399
BNE = 11298
BLT = 1721
BGE = 3480
BLTU = 7017
BGEU = 2248
LW = 31050
LBU = 27712
LHU = 502
SB = 4968
SH = 502
SW = 33037
ADDI = 87830
SLTIU = 1500
XORI = 1
ORI = 1
ANDI = 6151
SLLI = 10647
SRLI = 9534
SRAI = 95
ADD = 11486
SUB = 2813
SLL = 402
SLTU = 1844
SRL = 353
OR = 2459
CSRRW = 1
CSRRS = 8
LI* = 20602
Five Most Frequent:
1) ADDI = 87830 (27.05%)
2) BEQ = 33399 (10.29%)
3) SW = 33037 (10.17%)
4) LW = 31050 (9.56%)
5) LBU = 27712 (8.53%)
Memory Reading Area 80000000...80007ae2
Memory Writing Area 80001000...80007b3f
>>> Execution time: 1425296449 ns
>>> Instruction count: 324730 (IPS=227833)
>>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
>>> Branching T=26147 (44.19%) F=33016 (55.81%)
Without DEBUG_EXTRA option (no instructions stat and no memory usage stats) and with -O3 option (fastest optimization) emulator is capable of doing almost 13 millions instructions per second on my relatively modern AMD64 computer with Debain Linux onboard:
>>> Execution time: 25084843 ns
>>> Instruction count: 324730 (IPS=12945267)
>>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
>>> Branching T=26147 (44.19%) F=33016 (55.81%)
Here you can see that 15% of executed instructions are jumps (when PC is changed to something different from usual PC+4) and most jumps were backwards. Also branches were 44% true (with jump) and 56% false (no jump). Below you can see similar stats for some other benchmarks:
median:
>>> Execution time: 1391119 ns
>>> Instruction count: 16244 (IPS=11676930)
>>> Jumps: 3552 (21.87%) - 1254 forwards, 2298 backwards
>>> Branching T=2613 (53.36%) F=2284 (46.64%)
multiply:
>>> Execution time: 4743276 ns
>>> Instruction count: 49670 (IPS=10471665)
>>> Jumps: 13808 (27.80%) - 6310 forwards, 7498 backwards
>>> Branching T=12915 (86.46%) F=2022 (13.54%)
qsort:
>>> Execution time: 19821720 ns
>>> Instruction count: 236219 (IPS=11917179)
>>> Jumps: 45487 (19.26%) - 8141 forwards, 37346 backwards
>>> Branching T=37792 (59.71%) F=25503 (40.29%)
rsort:
>>> Execution time: 31545464 ns
>>> Instruction count: 374291 (IPS=11865129)
>>> Jumps: 15239 (4.07%) - 797 forwards, 14442 backwards
>>> Branching T=14653 (73.66%) F=5239 (26.34%)
towers:
>>> Execution time: 1474786 ns
>>> Instruction count: 18656 (IPS=12649970)
>>> Jumps: 2027 (10.87%) - 762 forwards, 1265 backwards
>>> Branching T=1037 (57.20%) F=776 (42.80%)
vvadd:
>>> Execution time: 1004666 ns
>>> Instruction count: 11974 (IPS=11918388)
>>> Jumps: 1830 (15.28%) - 492 forwards, 1338 backwards
>>> Branching T=1417 (62.18%) F=862 (37.82%)
As you can see it is very important to pipeline jumps properly - not just wasting cycles by wrong branching as it's usually done in simple RISC hardware designs (branch penalty) - it has to be branch prediction or even speculative execution of both branches with ignoring wrong path after condition becomes known.
GPL is a virus yes, but not good, if something is GPL tagged and your kids are starving GPL makes sure you cant use that GPL infected code pay for the food. If you license something under GPL (like the hw RTL) and it is really good it would be stolen anyway, but from those who can afford layers, if a small guys does it, he gets into trouble, big corporations not. GPL is nonsense for RTL designs.