-
Chunking
05/26/2017 at 04:20 • 0 commentsA main motivation for this project is the streaming of video over Ethernet to the #WizYasep screen controllers. The data is transmitted over UDP and received by a WZ5300 controller. For reasons that remain unknown, the MTU is not as large as documented and I have succeeded transmission with 1200 bytes per packet. It can be a little more but that's what I work with...
I want to keep packet framing as simple and easy as possible. The trick of "chunking" on 3RGB is to group small data blocks to fit in 1200 bytes and send them over. Error detection occurs at the chunk level so a whole chunk can be discarded if the received chunk is invalid. 3R-encoded blocks are "self-validating" to add another level of resilience but the interesting part is to use the UDP protocol to implicitly carry the size informations.
Using TCP, which is a stream protocol, the receiver must first read the number of bytes in a block or chunk, then ensure that enough are available in the reception queue. With UDP, the chunks are already correctly sized, the Wiznet Ethernet tranceiver announces that one packet with XXX bytes has been received and there is no need to parse it, or so little. A whole number of blocks is available for direct use.
The desired chunk size is in the 1024-1200 bytes range and this influences the granularity of the data blocks. Ideally, tles would be 8×8, as a good compromise for compression ratio versus effort, and the smaller sizes make the tiles easy on the CPU caches or the small FPGA. However 8×8 bytes is only 64 bytes, which is pretty small compared to the MTU.
16×16 pixels tiles would contain 256 values, and 4 or 5 uncompressed blocks of 256 bytes fit in the MTU: 4×256=1024, 5×256=1280 (without block type headers) which is a bit too large, so the larger blocks create a big variability in the chunk size.
So chunks are stored in groups of roughly 1200 bytes on disk, and each chunk is preceded by a fixed-sized word that contains informations for the player, such as:
- type of the chunk (start of frame, key/delta frame, end of stream...)
- size (11 bits)
- destination (optional, like the 8 LSB of the IP address ?)
Inside the chunk is an ordered collection of individual blocks of various types, from a few bytes to 1200 bytes.
-
Colorspace
05/23/2017 at 23:40 • 0 commentsThe subject of the color encoding has been touched in the log Color spaces but no satisfying method has been found. This is because these were not really designed for lossless compression, or too much influenced by the lossy TV methods.
The early explorations just considered plain R/G/B coding (it's simple, easy and resilient) but there is the feeling that a lot of redundancy remains, particularly when colors are saturated.
Here, I design a colorspace that better exploits the 3R block compaction algorithms, where the lowest amplitudes are favored.
Most colorspaces use a luminance/chrominance approach, with one signal that is the sum of all the components, and two signals are a difference with the first sum. These transforms are designed to maximise lossy compression where chroma is more lossy than luma, which explains why the components are weighted (typically G>R>B).
The 3RGB colorspace is subtly different because it tries to use encoding tricks borrowed from 3R, instead of psychovisual tricks: it encodes the raw sum, without weighting, as well as two components (like R and B) almost directly.
G + B => Cgb Cgb + R => Luma Send Luma, R, B
Decoding is easy as this:
Get Luma Get R Luma - R => Cgb Get B Cgb - B => G
From this point of view, this is probably the dumbest colorspace ever, so (except for its simplicity) why care ?
The point is to reduce the number of bits used with saturated colors, and exploit anciliary data, in particular the maximal value that each component can have, and mount a 3R-like algo, in order to only send the signals with the least amplitudes. Most colorspaces directly decode the two other components from the Luma, whereas here we use a longer chain, to reduce the range as much as possible. This is slower but provides better compaction.
Let's take 8-bits components: each pixel uses one byte for R, G and B. The minimum range of their sum is [0..765]. Let's look at the range [511..765] for example : it reduces the possible range of each component because they are all limited to 255. This is also where saturated colors reside (usually one component is off while two others are full on).
There are two particular cases of interest :
- Luma = 0 : the zerotree doesn't grow, all other components are off. End of story.
- Luma = 765 : all the components are on, no more information needed. End of story.
This case can be extended to Luma = 1:
- If R=1 then Cgb=0, no need to read G or B
- If R=0 then Cgb=1, read B (G=Cgb-B)
Same with Luma = 764: there is no need to encode R in the [0..255] range because the possible values are either R or G or B is 254, the other two are 255. R can be encoded in the [0..1] range:
- If R=1 then Cgb=511, no need to read G or B (which are implied to be 255)
- If R=0 then Cgb=510, so R=255, read B (one bit) to select whether G or B is 254 (the other is 255)
The case is obviously mirored between Luma=0 and Luma=765, with complementary values (something that the classic 3R algorithm doesn't exploit).
The fact that each component has a limit, reduces the coding space so there is a hope to encode less than 24 bits per pixel in many favorable cases. The worst case can't add more than 2 bits and this is not easy to construct because as the sum reaches 765, the other components are reduced. However the pigeonhole principle excludes that all the combinations use less than 24 bits. We'll examine this whole situation later.
The same ideas can be applied to other parts as well, such as encoding the minimum and maximum values of pixels in a block: while the minimum value uses a simple byte, the maximum value is in a variable range and can be encoded with less bits, using the knowledge of the preceding number, because their sum can't exceed 255.
So far we consider only the case where R, G and B have the maximum possible range but very often this is not the case. Each component can have their own range in a block so a formula must be extended a bit, using the anciliary range data:
- MinR, minG, MinB,
- MaxR, MaxG, MaxG (relative to MinR, minG, MinB, respectively)
Then Luma is in the range [ 0..MaxR+MaxG+MaxG ] and all the pixel values are offset by their respective MinX.
The σα transform (as it is now called) is explained in log 13. Breakthrough, for the special case of a pair of identical ranges, which applies directly to the (implicit) sum Cgb and the coded Blue component.
It gets more complicated when encoding R because R and Cgb don't have the same ranges. This will be covered in the new dedicated project #σα code (in the log "Extension to unequal boundaries")
Continued in the log Colorspace decomposition (episode 2: the big swap)