The latest source code archive contains the enhanced decoder for the register set, including 3 strategies:

- Straight (fast)
- update only meaningful control lines
- update only meaningful control lines when the related field is used

I provide a pseudo-randomised test to compare these strategies and the outcome is great:

[yg@Host-001 R7]$ ./test.sh Testing R7: straight decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 702273 toggles latching decoder:R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 301068 toggles Instr-sensitive :R7_tb_dec.vhdl:165:5:(report note): 100000 iterations, 160231 toggles R7: OK

There is a ratio of approx. 1/5 between the first and third result, which I explain below :

- Given that the probability of one bit being set is pretty close to 1/2, it makes sense that the first "straight" decoder toggles the output bits every other time in average. There are 14 control lines to drive and with a 1/2 probability, 7 lines change.

- The next method gives a better result, that you can understand using similar logic : we get 3 toggles per instruction, which makes total sense. There are 2 decoders but only 1/2 chance of change, so we can focus on one decoder. Each decoder updates only 3 of the 7 control lines because the other 4 give results that will not be used. So far, so good, no surprise at all.

- The last method gives an average toggle rate of 1.6 per instruction. This is one half of the previous result and though it should be taken with a lot of precaution, the benefit is clear. Some instructions (about 1/4) don't use the SND field, and the SRI field is not used when Imm8 or Imm4 fields are used, giving a further significant reduction of toggles.

Of course, these numbers are NOT representative of real use cases. I used pretty uncorrelated bits as sources, while real workloads have some sorts of patterns. The numbers will certainly increase or decrease, depending on each program.

There is a compromise for each situation and the 3 methods are provided in the source code, so you can choose the best trade-off between latency and consumption. The numbers are pretty good and I think I reached the point of diminishing return. Any "enhancement" will increase the logic complexity with insignificant gains...

## Discussions

## Become a Hackaday.io Member

Create an account to leave a comment. Already have an account? Log In.