Close

First Crack: LZW

A project log for 1kb 6502 Image Compression

Attempt to compress a 1kb image such that the decompression algorithm, display logic, and the compressed image itself all fit in 1kb.

peter-noyesPeter Noyes 12/13/2016 at 06:525 Comments

Based on all of the initial comments on this project I decided that LZW is a good initial approach. I implemented an encoder and the results are in: 853 bytes, or 16.6% smaller for the above image. The big question is, is this good enough? Can I implement a decoder and image display routine in 171 bytes??

To test that things are really working I also compressed the following image:

As expected this one is significantly smaller and came in at 163 bytes or 84% smaller.

The LZW algorithm I implemented is compressing run lengths rather than compressing at the byte level. My intuition tells me that this should give better results, but I guess I might as well see what the difference is. The algorithm is as follows:

I also did an initial stab at using the UP PNG predictor which basically subtracts the previous line from the current, but it didn't make much of a difference. I will see if I can eek out any better performance with the current trajectory I am on. But I need to balance the complexity of the decode algorithm with the compression ratio. Maybe arithmetic is next...

Discussions

Peter Noyes wrote 12/14/2016 at 04:08 point

I have made some more progress with LZW. I took Mark Sherman's feedback and am now doing run lengths without trying to encode black/white within the code because I can infer that color by just flipping back and forth. I am still using LZW and compressing the run lengths and I am down to 787 bytes, which is 23% smaller and leaves me with 237 bytes for code.

The image I am using has 1376 total runs. Even though LZW can't combine existing sequences, it is still finding lots of fairly long sequences that share a common prefix. 

I think I will keep going and experiment with predictors some more.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/14/2016 at 05:25 point

you could study the histograms of the run lengths, see the effect of each predictor...

For lengths >= 32, use an "escape code" (code 0 ?) to increase the length and add the necessary MSB to the run length ?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/14/2016 at 05:30 point

Tuning the maximum code size might also influence the compression : code reuse might be lower or higher if you allow 6, 7 or 8 bits per code. The "replacement" algorithm (when the codespace is exhausted) also plays its role... a balance between "most recently used" and "most used" must be found.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/14/2016 at 02:55 point

There are other predictors such as Paeth :-) it amounts to some sort of XOR no ?

Furthermore, stock LZW is not the most efficient because it can't merge 2 sequences, only a sequence followed by a symbol.

  Are you sure? yes | no

Mars wrote 12/13/2016 at 19:07 point

Forget the dictionary and try pure RLE.  Its pure black and white, so you can just encode the first color (does it start on black or start on white), and then encode the length.  The image will start on black and X pixels, then it'll be white for Y pixels, then black again for X pixels.  You only need to encode the lengths.  If you need a run for more than your maximum encoding (you need 80 black pixels but your maximum encoding is 63), you could allow '0' as a valid length:  80 black would be 63 black, 0 white, 17 black.  You just need to find the optimal word size.  If you bring back the dictionary idea, you could use a huffman tree to use a variable word size.

  Are you sure? yes | no