]> Dirac: Wavelet Coefficient Coding
Dirac Home
Navigation item arrowHome
Navigation item arrowDocumentation
Navigation item arrowDirac Algorithm
Navigation item arrowContents
Navigation item arrowIntroduction
Navigation item arrowArchitecture
Navigation item arrowRDO
Navigation item arrowTransform coding
Navigation item arrowMotion estimation
Navigation item arrowMacroblocks
Navigation item arrowMotion vector coding

SourceForge.net Logo
Valid XHTML 1.1!
Wavelet Coefficient Coding

Previous: RDO selection of subband quantisers
Next: Motion estimation contents

Dirac codes subbands from low-frequency to high frequency in a zig-zag order. Within each subband, the coefficients are further partitioned spatially divided into code blocks. Coefficients are scanned by coding the code blocks in normal raster order and coding the coefficients within each code block in raster order within that block. However, each code block can be skipped so a skip flag is included in the stream of symbols to be coded. The skip flag is interpreted in the decoder as setting all coefficients within the block to be zero.

The number of code blocks in a subband depends on the frame type and on the subband. Intra frames have a single code block for 7 the lowest-frequency 7 subbands (ie the bottom 2 levels of the wavelet transform) and a 4x3 array of code blocks for the remaining bands. Predicted frames have more code blocks, as coefficients are more likely to be zero: 1 block only for the 4 lowest-level subbands; 8x6 for the next 3 subbands; 12x8 for the remaining subbands. These values are somewhat ad hoc, and probably will ultimately be configurable.

There are two possible quantisation modes made possible by the code block structure. We may either have a single quantiser for the whole subband or have different quantisers for different blocks. Currently the software only supports the first option, but ultimately both options will be implemented, which will allow Region Of Interest coding.

Entropy coding

The entropy coding used by Dirac in wavelet subband coefficient coding is based on three stages: binarisation, context modelling and adaptive arithmetic coding.

Downloading a GIF rendering as your browser doesn't support SVG. Please ignore the "install additional plugins" message if you see it. More details Sorry, your browser can't connect to the server to download a GIF substitute.
        Either install an SVG-enabled browser or connect to the internet to download the diagram.

Figure: Entropy coding block diagram

The purpose of the first stage is to provide a bitstream with easily analysable statistics that can be encoded using arithmetic coding, which can adapt to those statistics, reflecting any local statistical features.

Binarization

Binarization is the process of transforming the multi-valued coefficient symbols into bits. The resulting bitstream can then be arithmetic coded. The original symbol stream could have been coded directly, using a multi-symbol arithmetic coder, but this tends to suffer from 'context dilution': most symbols occur very rarely and so only sparse statistics can be gathered, which reduces coding efficiency.

The most obvious way to binarize a symbol is directly: a symbol is encoded by encoding the constituent bits of the binary representation of its magnitude, followed by a sign bit. This is termed bit-plane coding. The problem with this is that the number of bit planes depends on the size of the largest value, which means that either a lot of unnecessary zero symbols must be sent for smaller values or the size (number of bits) of each symbol must be encoded somehow.

Dirac uses an interleaved exp-Golomb binarisation. This can be thought of as combining a length code with the binary representation. Here's how it works for coding magnitude (non-negative) values.

Firstly, add 1 to your value. Then the binary representation has a leading 1:

N+1 = 1 bK-1...b0

The leading 1 is redundant and doesn't have to be sent. We can prefix each data bit with a "follow" bit which tells us there's a new bit to come: 0 means there's an additional bit and 1 means terminate. So the code for N will be:

0 bK-1 0 ... 0 b0 1

In coventional exp-Golomb coding, the length code of K zeroes followed by a 1 would be a prefix to the data bits. By interleaving them we have ensured that the value can be decoded in a single loop, two bits at a time, rather than in two loops.

Context modelling

The context modelling in Dirac is based on the principle that whether a coefficient is small (or zero, in particular) or not is well-predicted by its neighbours and its parents. Therefore the codec conditions the probabilities used by the arithmetic coder for coding the first follow bit on whether the neighbouring coefficients or the parent coefficient are zero.

The reason for this approach is that, whereas the wavelet transform largely removes correlation between a coefficient and its neighbours, they may not be statistically independent even if they are uncorrelated. The main reason for this is that small and especially zero coefficients in wavelet subbands tend to clump together, located at points corresponding to smooth areas in the image, and as discussed elsewhere, are grouped together across subbands in the parent-child relationship.

After binarization, a context is selected, and the probabilities for 0 and 1 that are maintained in the appropriate context will be fed to the arithmetic coding function along with the value itself to be coded.

Arithmetic coding

Conceptually, an arithmetic coder can be thought of a progressive way of producing variable-length codes for entire sequences of symbols based on the probabilities of their constituent symbols. For example, if we know the probability of 0 and 1 in a binary sequence, we also know the probability of the sequence itself occurring. So if

P(0)=0.2, P(1)=0.8

then

P(11101111111011110101)=(0.2)3(0.8)17=1.8x10-4 (assuming independent occurrences).

Information theory then says that optimal entropy coding of this sequence requires log 2 ( 1 p )=12.4 bits . AC produces a code-word very close to this optimal length, and implementations can do so progressively, outputting bits when possible as more arrive.

All arithmetic coding (AC) requires are estimates of the probabilities of symbols as they occur, and this is where context modelling fits in. Since AC can, in effect, assign a fractional number of bits to a symbol, it is very efficient for coding symbols with probabilities very close to 1, without the additional complication of run-length coding. The aim of context modelling within Dirac is to use information about the symbol stream to be encoded to produce accurate probabilities as close to 1 as possible.

Dirac computes these estimates for each context by maintaining a 16 bit probability word representing the probability of a zero value in that context. When a value has been coded, the probability is modified by incrementing (if zero was coded) or decrementing this value by a delta.

The delta value itself depends only on the probability and is derived from a look-up table. If the probability is near 1 or 0 the delta is approximately 1/256; if the probability is near 1/2, the delta is approximately 1/32. This avoids the need to maintain explicit counts of 1 and 0 and makes for very efficient updating. There is no rescaling or division in the computation, and the estimate adapts rapidly for balanced probabilities and slowly for highly skewed probabilities, as it should.

Previous: RDO selection of subband quantisers
Next: Motion estimation contents