Basic algorithm for one block

Greyscale encoding is less interesting than RGB because there is less potential for compression. So far, one 8×8 RGB block is encoded with these steps:

Get the min/max of each component:

// the input data for one 8×8 block:
var R=Array(64),
    G=Array(64),
    B=Array(64);
// scan:
var
  minR=R[O],
  minG=G[O],
  minB=B[O],
  maxR=minR,
  maxG=minG,
  maxB=minB;
for (var i=1; i<64; i++) {
  var t=R[i],
      u=G[i],
      v=B[i];
  if (minR > t) minR = t;
  if (maxR < t) maxR = t;
  if (minG > u) minG = t;
  if (maxG < u) maxG = t;
  if (minB > v) minB = t;
  if (maxB < v) maxB = t;
}

Send min/max|R/G/B/ right away, or store them somewhere else for further processing. Here it's the case of direct transmission: min is encoded right away as a byte, max is encoded with phase-out code because it is in the range [min..255]
```
send_byte(minR);
send_byte(minG);
send_byte(minB);
maxR -= minR,
maxG -= minG;
maxB -= minB;
send_phaseout(maxR, 255-minR);
send_phaseout(maxG, 255-minG);
send_phaseout(maxB, 255-minB);
	
```
In a more elaborate version, the minX and maxX are collected in another 8×8 array for further processing and compaction.
This step is just for the formal definition but it will be merged with other steps: remove the minX offset from the respective blocks
```
for (var i=0; i<64; i++) {
  R[i] -= minR,
  G[i] -= minG,
  B[i] -= minB;
}
// Prediction filter can be applied here too
	
```
Now, minX is out out of the way. We'll bother only about the amplitudes after this point.

Sort the component in order of max. amplitude

var Cmax, Cmed, Cmin;
var maxMax, maxMed, maxMin;

  Cmed =    G; // speculative assignation
maxMed = maxG; // (used by two cases)
// possible cases: BGR, BRG, GBR, GRB, RBG, RGB
if (maxR > maxG) {
// possible cases: BRG, RBG, RGB
    Cmin =    G; // speculative assignation
  maxMin = maxG; // (used by two cases)
  if (maxR > maxB) {
// possible cases: RBG, RGB
      Cmax =    R;
    maxMax = maxR;
    if (maxG > maxB) {
// actual case: RGB
        Cmin =    B;
      maxMin = maxB;
    } else {
// actual case:  RBG
//  (can goto to case GBR)
        Cmed =    B;
      maxMed = maxB;
    }
  } else {
// actual case: BRG
      Cmax =    B;
    maxMax = maxB;
      Cmed =     R;
    maxMed =  maxR;
  }
} else {
// possible cases: BGR, GBR, GRB
    Cmin =    R; // speculative assignation
  maxMin = maxR; // (used by two cases)
  if (maxG > maxB) {
    // possible cases: GRB, GBR
      Cmax =    G;
    maxMax = maxG;
    if (maxR > maxB) {
// actual case: GRB
        Cmin =    B;
      maxMin = maxB;
        Cmed =    R;
      maxMed = maxR;
    } else {
// actual case:  GBR
        Cmed =    B;
      maxMed = maxB;
    }
  } else {
// actual case: BGR
      Cmax =    B;
    maxMax = maxB;
  }
}

This block of code might look large and scary but it's optimised for fast execution: at most 5 assignations and 3 tests.
(note that Cmin, Cmed and Cmax are just "pointers", or can be implemented as different types of circuits)

Implement the nested color transformations σα( σα(Cmax, Cmed), Cmin)
1. σα(Cmax, Cmed) generates
  - the sum S1=Cmax+Cmed, with limit L1=maxMax+maxMed,
  - the reversed data A1 (depending on Cmed), with limit L2≤maxMed
  A1 and L2 can be used (almost) immediately or they can be stored in two new arrays for further compaction with 3R.
2. σα(S1, Cmin) generates
  - the sum S2=S1+Cmin, with limit L3=L1+maxMin,
  - the reversed data A2 (depending on Cmin), with limit L4≤maxMin
  Again, A2 and L4 can be used (almost) immediately or they can be stored in two new arrays for further compaction with 3R.
For the direct transmission, send the computed values in reverse order of the previous part (to allow decompression)
```
for (var i=0; i<64; i++) {
    // (Compute step 5 here)
  send_phaseout(S2, L3); // Sum
  send_phaseout(A2, L4); // Cmin
  send_phaseout(A1, L2); // Cmed
}
	
```
Notice that certain values do not change within a block (L1, L3 and some parameters of σα), so they can be precomputed before the loop's body. The others require additional temporary storage (S1, A1, L2, A2, L4).
Steps 5 and 6 can also be computed in parallel with a similar version, which uses predictions from the previous neighbours.
- The sizes of the resulting 2 bistreams (with and without prediction) are compared and the largest block is discarded.
- If prediction is used, a single-bit flag must also be set in the prefix of the block. You can't decode the whole block if there is a mix of raw and predicted values.
- The prediction can be computed in step 3 for example.
- Linear prediction is computed modulo the given component's range, such that negative predictions wrap around (but still give a bit-accurate result when added again, modulo the range)
.
Both versions (raw and predicted) can also be tentatively encoded with 3R. The order of encoding follows the order in step 6, which allows accurate reconstruction:
1. First, the block of the sums S2 is processed, with total MAX = L3<<6, and individual MAX=L3 (global for the whole tree, implicit from the data provided in step 2, which also allows to de-shuffle Cmin, Cmed and Cmax)
2. Then, Cmin is encoded with the individual A2s. The individual L4 are used for the limits, and the total sum is limited by the sum of all L4 (that depends on maxMin and S2).
3. Finally, Cmed is encoded with the individual A1s and L2s (which can be recovered from the previous steps 8.a and 8.b as well as maxMed)
Cmax can be recovered later by the subtraction of the previous data.
Of course, another flag must be set to indicate that 3R is used instead of direct encoding.

OK, it looked pretty easy and simple in the beginning and now we're doing pretty complex stuff but in theory, none of these steps should (significantly) expand input data (except for a few control bytes). For both cases of prediction (with/without) the shortest result is chosen.

Another version/approach encodes S2, A2 and A1 in separate sub-blocks with phasing-out, while at the same time computing the tree sum of the corresponding 3R. This way, each of the block can be selected for the shortest version, in case 3R inflates. The 3R flag is then stored in the first bits of the sub-block.

The number of cases is further increased for the situation of delta-images, with the encoding of the difference with the previous frame. This is possible for pre-computed videos, where a fast computer can compute all 3 cases and choose the best result, while the overhead for the decoder is marginal (only one block to decode, with just a few parameters to change).

Note: for delta-blocks, the parameters minX might be dropped because differences are centered around 0. Samples then have to be "folded" to keep them positive.

Colorspace decomposition (episode 2: the big swap)

Design of the phase-out code

Discussions

Become a Hackaday.io Member