The phase-in code (or truncated binary code) is significantly better than the plain binary code because it can remove one bit from a number in certain cases.

http://ygdes.com/3r/ has shown that a modified version, the phase-out code, is (in average) slightly better than phase-in code. It is not always the best because it depends on the provided data. However, we don't care much about reducing the size of numbers that are already small : the large numbers are those that must be reduced.

The early tests used a simple transform (*n = lim - n*) to reuse the existing phase-in code. The optimised version must use a leaner, more direct and less serial algorithm, which can then be implemented in logic gates. I had to come up with a suitable algorithm, which is:

Parameters: k bits maximum, mask = (1<<k) - 1; mask>>1 ≤ lim ≤ mask; n ≤ lim

if ( n > (((lim<<1) &mask) |1 )) { k--; n = n+(mask>>1) -lim; } send_bits (n, k);

This is quite similar to the code for phase-in presented by Wikipedia (edited for coherence):

u = (1 << k) - lim; if (n < u) send_bits(n, k-1); else send_bits(n+u, k );

There are some very interesting nuances however:

- The expression
**(((lim<<1) &mask) |1 )**amounts to a rotation of**lim**, it is very easy to design/wire as a digital circuit. - The expression
**n+(mask>>1) -lim**has various expressions that are easier to wire as digital circuits, such as**n- (lim & mask>>1) +1**, which amounts to a simple subtraction with one masked operand (and the mask's LSB is dropped). The +1 can be wired as a carry-in set to 1, which can be also optimised out in logic. - Both expressions above can be computed in parallel because they are independent. As a digital circuit, the results
**n**and**k**are selected with a MUX controlled by the comparison's result. As a program, in a superscalar processor, the various values are computed speculatively and affected with a predicate.

I am more concerned by the computational cost and latency in hardware than software, and by the decoding than the coding.

Speaking of decoding, there is the requirement to play nicely with the bitstream extraction circuit: call the function **receive_bits()** only once, but how do we know how many bits to get, since it's a variable length code ?

If the function is called with the shortest size in argument, it must be called later to complete the missing bit. This takes more time, but there is no risk to overflow the bitstream when it reaches the end.

So the fast solution is to over-request bits: if the received number uses fewer bits than requested, the bit pointer can be re-adjusted before the next cycle. This solves the question of speed but leaves us with a new probem: what happens when the stream ends on a word boundary with a short code ?

It is easy to know when the bitstream ends and almost as easy to not read one more word beyond the end. The stream reader could return any value because it will not be used. It can be zero, all ones, or the last value.

However the register **must** be shifted because the shifter can't shift by -1.

There's a few details that I had to iron out but now it works: exercise-phase-out.html

- Actually, decoding past the last word should return 0 : returning 1 creates all kinds of problems...
- the decoding is almost like encoding, but I forgot to integrate the fact that one LSB should be removed:

`val=receive_bits(k); if (val > (((lim<<1) &mask) |1 )) { val = ((val>>1)+lim)-(mask>>1); rcvReg.offset++; // re-increment the bit counter }`

Apart from that, it works like a charm.

*20170704*

Some code optimisations give the simplified result for encoding:

```
uint32_t k=0, l=lim, mask;
if (lim) {
// generate mask from lim
if (l & ~255) { k =8; l=lim>>8; }
if (l & ~ 15) { k+=4; l>>=4; }
if (l & ~ 3) { k+=2; l>>=2; }
if (l & ~ 1) { k+=1; l>>=1; }
mask=(1<<k)-1;
if ( (val>>1) > (lim & mask) )
val+=mask-lim;
else
k++;
```

OTOH decoding is a bit more hairy...```
uint32_t
val=0, // default value if direct return
val1, // val >> 1
k, // number of bits
l=lim, // temporary limit
mask,
mask1; // mask >> 1
if (lim) {
k=1;
if (l & ~255) { k =9; l=lim>>8; }
if (l & ~ 15) { k+=4; l>>=4; }
if (l & ~ 3) { k+=2; l>>=2; }
if (l & ~ 1) { k+=1; l>>=1; }
Reg_Offset -= k;
mask=(1<<k)-1;
mask1 = mask>>1;
val = (Reg_buffer >> Reg_Offset) & mask;
val1 = val>>1;
....
// adjust (phase-out)
if (val1 > (lim & mask1)) {
val = (val1+lim)-mask1;
Reg_Offset++;
}
```

A new version is available as phaseout.c along with some testbench code.
## Discussions

## Become a Hackaday.io Member

Create an account to leave a comment. Already have an account? Log In.