Close

Squeezing Textures to Fit

A project log for QuickSilver Neo: Open Source GPU

A 3D Graphics Accelerator for FPGAs

ruud-schellekensRuud Schellekens 06/17/2016 at 17:440 Comments

Ordinary textures were just a bit to big to really work effectively. Anything we can do to make make them smaller in memory would be a great boon. We're already working at 8 bits per pixel precision though, so we can't really reduce that any further if we want to stay above EGA quality. Instead, we'll apply compression to the textures.

You may be familiar with image compression as used on the web or the desktop, typically in either JPEG or PNG format. These are great at reducing the size of an image by factor of 10 to 100 or more, while still keeping great image quality. Unfortunately, these algorithms are not a great fit for texture compression. Texture compression needs two important properties:

Those two are very important to get right. On the other hand, we don't really care about the following:

To make random access times lower, texture compression is usually done in blocks of a few pixels, with each block always the same size so you know exactly where to find it. We'll use the most common size: 4x4 texels. Let's draw the block so we know where we start:

Block-based compression algorithms basically assume that pixels in an image look a lot like the pixels that surround them. S3 Texture Compression 1, also known as DXT1 in one such algorithm. Instead of storing all 16 colours it stores just 2 colours Ca and Cb per 4x4 pixel block. Then, for each pixel in the block it has an index value that chooses one of those two colours. To make gradients (which are pretty common in many types of images) look a bit better, each index can also choose between a blend of the two colours, giving 4 options: Colour a, Colour b, ⅓ Ca + ⅔ Cb, and ⅓ Cb + ⅔ Ca. This gives the following encoding (note that DXT1 uses 16 bits per colour):

Because QuickSilver outputs at just 8 bits per pixel, and because it will probably use pretty small textures where every bit of detail counts, I've chosen for a slightly different trade-off. Instead of using 2 16-bit colours and adding two intermediate values, basic QuickSilver Texture Compression uses 4 8-bit colours. This trades colour precision for colour variety, something I think would be more valuable for very small textures.

We now use 64 bits for every 16 pixels, or just 4 bits per pixel, halving our original bandwidth and storage requirements. An improvement on all fronts? Almost. While the average amount of data we need per pixel has halved, the minimum amount has increased by a factor 8, as we now need to fetch an entire block to read even a single pixel from the image. Fortunately, the properties of the on-board RAM eases the pain a little. The RAM actually uses 16-bit data words, and the time between sequential accesses is much lower than the random access penalty. A random word read requires 70 ns, but 4 sequential words can be read in 120ns, less than twice the amount.

We do get a lot more pixels for those 120 ns. In a future post I'll talk about using a cache to make use of those extra pixels.

Discussions