How to divide the register set's power consumption by about 5

A project log for YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Yann Guidon / YGDESYann Guidon / YGDES 04/21/2019 at 17:500 Comments

The latest source code archive contains the enhanced decoder for the register set, including 3 strategies:

I provide a pseudo-randomised test to compare these strategies and the outcome is great:

[yg@Host-001 R7]$ ./ 
Testing R7:
  straight decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 702273 toggles
  latching decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 301068 toggles
  Instr-sensitive :R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 160231 toggles
R7: OK

There is a ratio of approx. 1/5 between the first and third result, which I explain below :

Of course, these numbers are NOT representative of real use cases. I used pretty uncorrelated bits as sources, while real workloads have some sorts of patterns. The numbers will certainly increase or decrease, depending on each program.

There is a compromise for each situation and the 3 methods are provided in the source code, so you can choose the best trade-off between latency and consumption. The numbers are pretty good and I think I reached the point of diminishing return. Any "enhancement" will increase the logic complexity with insignificant gains...