Dirac codes subbands from low-frequency to high frequency in a zig-zag
order. Within each subband, the coefficients are further partitioned
spatially divided into code blocks. Coefficients are scanned by coding the
code blocks in normal raster order and coding the coefficients within each
code block in raster order within that block. However, each code block
can be skipped so a skip flag is included in the stream of symbols to be
coded. The skip flag is interpreted in the decoder as setting all
coefficients within the block to be zero.
The number of code blocks in a subband depends on the frame type and
on the subband. Intra frames have a single code block for 7 the
lowest-frequency 7 subbands (ie the bottom 2 levels of the wavelet
transform) and a 4x3 array of code blocks for the remaining bands.
Predicted frames have more code blocks, as coefficients are more likely
to be zero: 1 block only for the 4 lowest-level subbands; 8x6 for the
next 3 subbands; 12x8 for the remaining subbands. These values are
somewhat ad hoc, and probably will ultimately be configurable.
There are two possible quantisation modes made possible by the code block
structure. We may either have a single quantiser for the whole subband or
have different quantisers for different blocks. Currently the software
only supports the first option, but ultimately both options will be
implemented, which will allow Region Of Interest coding.
Entropy coding
The entropy coding used by Dirac in wavelet subband
coefficient coding is based on three
stages: binarisation, context modelling and adaptive arithmetic
coding.
Figure: Entropy coding block diagram
The purpose of the first stage is to provide a bitstream
with easily analysable statistics that can be encoded using
arithmetic coding, which can adapt to those statistics,
reflecting any local statistical features.
Binarization
Binarization is the process of transforming the multi-valued
coefficient symbols into bits. The resulting bitstream can then
be arithmetic coded. The original symbol stream could have been
coded directly, using a multi-symbol arithmetic coder, but this
tends to suffer from 'context dilution': most symbols occur very
rarely and so only sparse statistics can be gathered, which reduces
coding efficiency.
The most obvious way to binarize a symbol is directly: a
symbol is encoded by encoding the constituent bits of the
binary representation of its magnitude, followed by a sign bit.
This is termed bit-plane coding. The problem with this is that
the number of bit planes depends on the size of the largest value,
which means that either a lot of unnecessary zero symbols must be sent
for smaller values or the size (number of bits) of each symbol must be encoded
somehow.
Dirac uses an interleaved exp-Golomb binarisation. This can be thought of as combining
a length code with the binary representation. Here's how it works for coding magnitude
(non-negative) values.
Firstly, add 1 to your value. Then the binary representation has a
leading 1:
N+1 = 1 bK-1...b0
The leading 1 is redundant and doesn't have to be sent. We can prefix each
data bit with a "follow" bit which tells us there's a new bit to come: 0 means
there's an additional bit and 1 means terminate. So the code for N will be:
0 bK-1 0 ... 0 b0 1
In coventional exp-Golomb coding, the length code of K zeroes followed by a
1 would be a prefix to the data bits. By interleaving them we have ensured
that the value can be decoded in a single loop, two bits at a time, rather
than in two loops.
Context modelling
The context modelling in Dirac is based on the principle
that whether
a coefficient is small (or zero, in particular) or not is
well-predicted by its neighbours and its parents. Therefore the
codec conditions the probabilities used by the arithmetic coder
for coding the first follow bit on whether the neighbouring
coefficients or the parent coefficient are zero.
The reason for this approach is that, whereas the wavelet
transform largely
removes correlation between a coefficient and its neighbours,
they may not be statistically independent even if they are
uncorrelated. The main reason for this is that small and
especially zero coefficients in wavelet subbands tend to clump
together, located at points corresponding to smooth
areas in the image, and as discussed
elsewhere, are grouped together across
subbands in the parent-child relationship.
After binarization, a context is selected, and the
probabilities for 0 and 1 that are maintained in the
appropriate context will be fed to the arithmetic coding
function along with the value itself to be coded.
Arithmetic coding
Conceptually, an
arithmetic coder can be thought of a progressive way of
producing variable-length codes for entire sequences of symbols
based on the probabilities of their constituent symbols. For
example, if we know the probability of 0 and 1 in a binary
sequence, we also know the probability of the sequence itself
occurring. So if
Information theory then says that optimal entropy coding of
this sequence requires
. AC
produces a code-word very close to this optimal length, and
implementations can do so progressively, outputting bits when
possible as more arrive.
All arithmetic coding (AC) requires are estimates of the
probabilities of symbols as they occur, and this is where
context modelling fits in. Since AC can, in effect, assign a
fractional number of bits to a symbol, it is very efficient for
coding symbols with probabilities very close to 1, without the
additional complication of run-length coding. The aim of
context modelling within Dirac is to use information about the
symbol stream to be encoded to produce accurate probabilities
as close to 1 as possible.
Dirac computes these estimates for each context by maintaining
a 16 bit probability word representing the probability of a zero
value in that context. When a value has been coded, the probability
is modified by incrementing (if zero was coded) or decrementing
this value by a delta.
The delta value itself depends only on the
probability and is derived from a look-up table. If the probability is near 1 or 0 the delta is approximately
1/256; if the probability is near 1/2, the delta is approximately 1/32.
This avoids the need to maintain explicit counts of 1 and 0 and makes
for very efficient updating. There is no rescaling or division in the computation,
and the estimate adapts rapidly for balanced probabilities and slowly for
highly skewed probabilities, as it should.