A deep dive into BC6H and BC7 texture decompression.

Some notes on the latest texture compression formats and how to decode them.

Introduction

It has been over 20 years since the advent of hardware texture compression known as S3TC. Microsoft licensed it and renamed it to DXT. ATI leveraged the silicon and created two new formats by combining channels in different ways. Microsoft licensed these and added them to DirectX 10 as BC4 and BC5.

The quality of images stored in these formats left something to be desired, so DirecX11 added the BC7 and BC6H formats (known as BPTC in OpenGL). BC7 improved quality at the expense of encode time, and BC6H is for HDR images. Also, the color block in the legacy formats being R5G6B5 means that pure gray scales cannot be represented; the textures are either slightly green or slightly magenta.

As part of the GPEG streaming protocol, we have a proprietary texture container that supports tiling and does additional processing to improve compression ratios. It is also directly loadable into Paint.NET and is set up to preview in Explorer using SharpShell. This means there needs to be a C# decoder to make this happen. I could use DirectXTexNet to wrap existing decoding support, but where's the fun in that? It also means I fully comprehend what is going on.

Links

Microsoft Texture Block Compression Documentation.
DirectXTex Reference Encoder & Decoder for BC6H & BC7
Alternate iOrange Decoder (header only)
Blog on Texture compressors

Things I learned after working with the formats.

What is convoluted in software is quite often simple in silicon. To this end, there are some none obvious 'features'.

  • There are 8 modes in BC7, these are stored as 1 to 8 bits. The first hot bit defines the mode, so 1 << 5 represents mode 5.
  • There are 14 modes in BC6H, the first 2 are stored as 2 bits and the next 12 are stored in an additional 3 bits.
  • The bits appear in reverse order - I'm not sure if endianness is the correct term, but (for example) 32 is stored as 00001 in the file. However, BC6H modes 13 and 14 have some fields stored in the reverse order to this (look for 'why ?? just why ???' in the iOrange decoder linked above).
  • Overview

    At first glance, the decoding seems much more complex than the older BC formats, but the concepts remain the same; there are end points and interpolation bits.

    End Points

    There is a pair of end points per subset. If the mode has two subsets, it will have two pairs of end points. BC formats 1 through 5 have a single subset per half block.

    Interpolation Bits

    There can be 2, 3, or 4 interpolation bits per component. These index into the aWeight2, aWeight3, and aWeight4 tables (iOrange) and g_aWeights2 etc (DirectXTex). These weights act as 6 bit fixed point multipliers. To save on bits, the textures are encoded such that certain pixels are guaranteed to be in the lower half of the range of interpolation bits, and as such are stored as a single bit less than the precision defined by the mode. For comparison, the color block in BC1 & BC3 uses two interpolation bits; the alpha block in BC3, BC4, BC5 uses three interpolation bits.

    Which pair of end points are used?

    These are defined by the shape. This is called partition in the BC7 documentation and shape in the BC6H documentation and is either 0, 4, 5, or 6 bits. The subset to use is defined by the table partition_sets (iOrange) and g_aPartitionTable (DirectXTex). Each pixel in the 4x4 block has a subset to use in these tables (look for variables uRegion (DirectXTex) or partitionSet (iOrange)). Shape seems a much better name than partition for this data.

    Reading the correct number of interpolation bits

    The g_aFixUp table (DirectXTex) shows which pixels use fewer bits. For example, for a mode with three subsets and using shape 0, pixels 0, 3, and 15 will use one less interpolation bit (those pixel numbers refer to the top left, top right, and bottom right pixels of the 4x4 compressed block). This fix up data is merged into the partition_sets table in iOrange by setting the high bit.

    Modes 4 & 5 have a second set of interpolation bits for color vs. alpha. For mode 4, these are different depths and can be swapped based on the index mode bit.

    Parity bits (BC7 only)

    These seem odd, but they represent the low bit of each component; an extra common bit per component. For mode 1, they are an extra bit shared between both pairs of end points. This hoop jumping seems in order to ensure a perfect gray scale can be represented.

    Rotation bits (BC7 only)

    These allow swapping of a color component with the alpha component for improved quality.

    BC6H Notes

  • BC6H does integer math and bit twiddling to get the final value. The resultant integer result is a half float between 0.0 and 1.0 but stored as a short. BitConverter.Int16BitsToHalf() is the C# conversion required. This makes some of the operations performed on the data less than intuitive.
  • The end points are stored with a base value and offsets for each component. So, 10.555 means there's a 10 bit base value and each color component is offset by 5 bits each. This is all done on the integer representation of a half float value.
  • Encoding speed

    Encoding BC6H and BC7 is slow. There are many combinations to try and this takes time. The reference DirectXTex takes about an hour to compress a 4k texture, and this is simply not practical. The Intel ISPC Texture Compressor is much quicker and recommended for real world usage.

    One method to increase encoding speed is to presume the mode and shape, however, this means the additional quality is lost and the format loses a lot of its value.

    Mode Usage

    Some modes are more popular than others. From my testing of a limited set of opaque images, BC7 mode 3 is used about half the time and mode 1 about a quarter of the time. For images with alpha, mode 6 is about two thirds and mode 5 about a quarter of the time. For BC6H, there seems to be a more even distribution of used modes, but 1 and 3 make a strong showing, with modes 2, 6, and 13 coming in second. With a limited set of images for an algorithm which is highly image dependent it is impossible to analyze properly. As the encoding is so slow, I am not eager to do pursue this any further. However, it does show the encoding is working and my decoding code is hit for every mode.