Close
0%
0%

Super-V

Research possibilities for superscalar 32-bit RISC-V under GPL v3

Similar projects worth following
Placeholder for my future most ambitious project...

Idea of Super-V appeared in my head when I worked on Retro-V - lightweight RISC-V RV32I implementation with 8-bit external bus. I thought what if external bus will be 64-bit? In this case we can read TWO 32-bit instructions at once (or even FOUR 16-bit "compressed" instructions). In order to work with such throughput we will need Superscalar approach...

LICENSE

GPL v3

license - 34.33 kB - 12/18/2018 at 04:23

Download

  • More statistics - top 5 instructions

    SHAOS12/25/2018 at 04:01 0 comments

    I extended emulator by finding of top 5 most frequent instructions - these are results for some RISC-V benchmarks:

    dhrystone:
    
    Five Most Frequent:
    1) ADDI   = 87830 (27.05%)
    2) BEQ   = 33399 (10.29%)
    3) SW   = 33037 (10.17%)
    4) LW   = 31050 (9.56%)
    5) LBU   = 27712 (8.53%)
    
    median:
    
    Five Most Frequent:
    1) ADDI    = 3758 (23.13%)
    2) LW    = 3519 (21.66%)
    3) BNE    = 1825 (11.23%)
    4) BGE    = 1240 (7.63%)
    5) SW    = 1141 (7.02%)
    
    multiply:
    
    Five Most Frequent:
    1) ADDI    = 9581 (19.29%)
    2) BNE    = 7309 (14.72%)
    3) SLLI    = 7052 (14.20%)
    4) BEQ    = 6691 (13.47%)
    5) ANDI    = 6540 (13.17%)
    
    qsort:
    
    Five Most Frequent:
    1) ADDI    = 77881 (32.97%)
    2) LW    = 56308 (23.84%)
    3) BLT    = 37593 (15.91%)
    4) SW    = 17257 (7.31%)
    5) BLTU    = 7834 (3.32%)
    
    rsort:
    
    Five Most Frequent:
    1) LW    = 76238 (20.37%)
    2) ADDI    = 54419 (14.54%)
    3) SW    = 53704 (14.35%)
    4) ADD    = 51461 (13.75%)
    5) SLLI    = 50106 (13.39%)
    
    towers:
    
    Five Most Frequent:
    1) ADDI    = 6397 (34.29%)
    2) SW    = 3716 (19.92%)
    3) LW    = 3682 (19.74%)
    4) LI*    = 944 (5.06%) <=== this is part of ADDI
    5) BEQ    = 615 (3.30%)
    
    vvadd:
    
    Five Most Frequent:
    1) ADDI    = 3809 (31.81%)
    2) LW    = 2135 (17.83%)
    3) BNE    = 1443 (12.05%)
    4) SW    = 945 (7.89%)
    5) ADD    = 745 (6.22%)

    As you can see most frequent RISC-V instruction is ADDI (that is also used for LI "load immediate" assembler command and some others as NOP and MV). The only exception is rsort benchmark test where ADDI is 2nd and 1st one is LW (load word). As you can see I counting LI separately (this count included to ADDI count) just to have visibility to its usage.

  • Some statistics from emulator

    SHAOS12/23/2018 at 04:15 3 comments

    I added some statistics calculations into RV32I[MA] emulator ( originally created by Fabrice Bellard and modified and shared on Hackaday by @Frank Buss ) and collected stats from some RISC-V benchmark tests (see https://github.com/riscv/riscv-tests/tree/master/benchmarks). With DEBUG_EXTRA option it collects this info from Dhrystone benchmark for example:

    Instructions Stat:
    LUI   = 892
    AUIPC   = 7716
    JAL   = 11212
    JALR   = 12850
    BEQ   = 33399
    BNE   = 11298
    BLT   = 1721
    BGE   = 3480
    BLTU   = 7017
    BGEU   = 2248
    LW   = 31050
    LBU   = 27712
    LHU   = 502
    SB   = 4968
    SH   = 502
    SW   = 33037
    ADDI   = 87830
    SLTIU   = 1500
    XORI   = 1
    ORI   = 1
    ANDI   = 6151
    SLLI   = 10647
    SRLI   = 9534
    SRAI   = 95
    ADD   = 11486
    SUB   = 2813
    SLL   = 402
    SLTU   = 1844
    SRL   = 353
    OR   = 2459
    CSRRW   = 1
    CSRRS   = 8
    LI*   = 20602
    
    Five Most Frequent:
    1) ADDI   = 87830 (27.05%)
    2) BEQ   = 33399 (10.29%)
    3) SW   = 33037 (10.17%)
    4) LW   = 31050 (9.56%)
    5) LBU   = 27712 (8.53%)
    
    Memory Reading Area 80000000...80007ae2
    Memory Writing Area 80001000...80007b3f
    
    >>> Execution time: 1425296449 ns
    >>> Instruction count: 324730 (IPS=227833)
    >>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
    >>> Branching T=26147 (44.19%) F=33016 (55.81%)
    

    Without DEBUG_EXTRA option (no instructions stat and no memory usage stats) and with -O3 option (fastest optimization) emulator is capable of doing almost 13 millions instructions per second on my relatively modern AMD64 computer with Debain Linux onboard: 

    >>> Execution time: 25084843 ns
    >>> Instruction count: 324730 (IPS=12945267)
    >>> Jumps: 50209 (15.46%) - 18074 forwards, 32135 backwards
    >>> Branching T=26147 (44.19%) F=33016 (55.81%)
    

    Here you can see that 15% of executed instructions are jumps (when PC is changed to something different from usual PC+4) and most jumps were backwards. Also branches were 44% true (with jump) and 56% false (no jump). Below you can see similar stats for some other benchmarks:

    median: 
    
    >>> Execution time: 1391119 ns
    >>> Instruction count: 16244 (IPS=11676930)
    >>> Jumps: 3552 (21.87%) - 1254 forwards, 2298 backwards
    >>> Branching T=2613 (53.36%) F=2284 (46.64%)
    
    multiply:
    
    >>> Execution time: 4743276 ns
    >>> Instruction count: 49670 (IPS=10471665)
    >>> Jumps: 13808 (27.80%) - 6310 forwards, 7498 backwards
    >>> Branching T=12915 (86.46%) F=2022 (13.54%)
    
    qsort: 
    
    >>> Execution time: 19821720 ns
    >>> Instruction count: 236219 (IPS=11917179)
    >>> Jumps: 45487 (19.26%) - 8141 forwards, 37346 backwards
    >>> Branching T=37792 (59.71%) F=25503 (40.29%)
    
    rsort: 
    
    >>> Execution time: 31545464 ns
    >>> Instruction count: 374291 (IPS=11865129)
    >>> Jumps: 15239 (4.07%) - 797 forwards, 14442 backwards
    >>> Branching T=14653 (73.66%) F=5239 (26.34%)
    
    towers: 
    
    >>> Execution time: 1474786 ns
    >>> Instruction count: 18656 (IPS=12649970)
    >>> Jumps: 2027 (10.87%) - 762 forwards, 1265 backwards
    >>> Branching T=1037 (57.20%) F=776 (42.80%)
    
    vvadd: 
    
    >>> Execution time: 1004666 ns
    >>> Instruction count: 11974 (IPS=11918388)
    >>> Jumps: 1830 (15.28%) - 492 forwards, 1338 backwards
    >>> Branching T=1417 (62.18%) F=862 (37.82%)

    As you can see it is very important to pipeline jumps properly - not just wasting cycles by wrong branching as it's usually done in simple RISC hardware designs (branch penalty) - it has to be branch prediction or even speculative execution of both branches with ignoring wrong path after condition becomes known.

View all 2 project logs

Enjoy this project?

Share

Discussions

Antti Lukats wrote 2 days ago point

please change from GPL to BSD or Apache. GPL is not a good idea, for most people it NO-GO if some RTL code is GPL.. really!

  Are you sure? yes | no

SHAOS wrote 2 days ago point

I know :)

it's some kind of trolling ;)

This project is NOT for big corporations...

  Are you sure? yes | no

Antti Lukats wrote a day ago point

it does not matter small guy likes Apache license also much much more..

GPL ist just nogo for HW in generic, no matter what your target is, realyl

  Are you sure? yes | no

SHAOS wrote 17 hours ago point

GPL is a very good virus - it makes everything around truly free :)

So nobody adds something proprietary and makes a closed chip out of it

  Are you sure? yes | no

SHAOS wrote 17 hours ago point

BTW there is at least 1 GPL RISC-V core already ( see https://riscv.org/risc-v-cores/ )

  Are you sure? yes | no

SHAOS wrote 17 hours ago point

Also for small guys who is for some reason afraid of true freedom I have Retro-V core under Apache license ;) 

  Are you sure? yes | no

Yann Guidon / YGDES wrote a day ago point

Oh, a license thread :-)

  Are you sure? yes | no

SHAOS wrote 12/20/2018 at 03:11 point

Some notes: I'm well aware of BOOM existence. Also I know company named "Esperanto" recently announced their own superscalar RISC-V implementation...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates