Dirac: Wavelet transform

]> Dirac: Wavelet transform

Home

Documentation

Dirac Algorithm
Navigation item arrow

Contents

Introduction
Navigation item arrow

Architecture
Navigation item arrow

RDO

Transform coding
Navigation item arrow

Motion estimation
Navigation item arrow

Macroblocks
Navigation item arrow

Motion vector coding

Wavelet transform

Previous: Transform coding architecture
Next: Parent-Child relationships

The discrete wavelet transform is now extremely well-known and is described in numerous references. In Dirac it plays the same role of the DCT in MPEG-2 in decorrelating data in a roughly frequency-sensitive way, whilst having the advantage of preserving fine details better. In one dimension it consists of the iterated application of a complementary pair of half-band filters followed by subsampling by a factor 2:

Figure: Perfect reconstruction analysis and synthesis filter pairs

These filters are termed the analysis filters. Corresponding synthesis filters can undo the aliasing introduced by the critical sampling and perfectly reconstruct the input. Clearly not just any pair of half-band filters can do this, and there is an extensive mathematical theory of wavelet filter banks. The filters split the signal into a LH and HF part; the wavelet transform then iteratively decomposes the LF component to produce an octave-band decomposition of the signal.

Applied to two-dimensional images, wavelet filters are normally applied in both vertical and horizontal directions to each image component to produce four so-called subbands termed Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH). In the case of two dimensions, only the LL band is iteratively decomposed to obtain the decomposition of the two-dimensional spectrum shown below:

Figure: wavelet transform frequency decomposition

The number of samples in each resulting subband is as implied by the diagram: the critical sampling ensures that after each decomposition the resulting bands all have one quarter of the samples of the input signal.

photograph of girl (LENA) decomposed by 3-level wavelet transform

Figure: 3-level wavelet transform of LENA

Wavelet filters

The choice of wavelet filters has an impact on compression performance, filters having to have both compact impulse response in order to reduce ringing artefacts and other properties in order to represent smooth areas compactly. It also has an impact on encoding and decoding speed in software.

There are numerous filters supported by Dirac to allow a trade-off between complexity and performance. These are configurable in the reference software. These filters are all defined using the 'lifting scheme' for speed.

The lifting stages are defined as follows. One fiter available in Dirac is a lifting approximation of the Daubechies (9,7) wavelet: for this we have (s denoting sum and d denoting difference),

\begin{array}{l} s_{n}^{0} = x_{2 n} \\ d_{n}^{0} = x_{2 n + 1} \\ d_{n}^{1} = d_{n}^{0} - (6497 \cdot (s_{n}^{0} + s_{n + 1}^{0})) / 4096 \\ s_{n}^{1} = s_{n}^{0} - (217 \cdot (d_{n}^{1} + d_{n - 1}^{1})) / 4096 \\ d_{n}^{2} = d_{n}^{1} + (3616 \cdot (s_{n}^{1} + s_{n + 1}^{1})) / 4096 \\ s_{n}^{2} = s_{n}^{1} + (1817 \cdot (d_{n}^{2} + d_{n - 1}^{2})) / 4096 \end{array}

The magic numbers are integer approximations of the Daubechies lifting ceofficients. This makes the transform fully invertible, where a floating point implementation wouldn't quite be. The implementation ignores scaling coefficients, since these can be taken into account in quantiser selection by weighting quantiser noise appropriately. The problem with this filter is that it has four lifting stages, and so it takes longer in software. At the other extreme, there is the (5,3) filter:

\begin{array}{l} d_{n}^{1} = d_{n}^{0} - (s_{n}^{0} + s_{n + 1}^{0}) / 2 \\ s_{n}^{1} = s_{n}^{0} + (d_{n}^{1} + d_{n - 1}^{1}) / 4 \end{array}

We can improve the (5,3) high pass filter, and get an approximating of the Daubechies low-pass filter by having more taps in the first lifing stage. This is the Deslauriers-Dubuc (9,7) filter:

\begin{array}{l} d_{n}^{1} = d_{n}^{0} - (- s_{n - 1}^{0} + 9 s_{n}^{0} + 9 s_{n + 1}^{0} - s_{n + 2}^{0}) / 16 \\ s_{n}^{1} = s_{n}^{0} + (d_{n}^{1} + d_{n - 1}^{1}) / 4 \end{array}

The Deslauriers-Dubuc (13,7) extends this further to provide better frequency selectivity in the low pass filter, still only using two lifting stages:

\begin{array}{l} d_{n}^{1} = d_{n}^{0} - (- s_{n - 1}^{0} + 9 s_{n}^{0} + 9 s_{n + 1}^{0} - s_{n + 2}^{0}) / 16 \\ s_{n}^{1} = s_{n}^{0} + (- d_{n + 1}^{1} + 9 d_{n}^{1} + 9 d_{n - 1}^{1} - d_{n - 2}^{1}) / 32 \end{array}

Padding and invertibility

Clearly, applying an N-level wavelet transform requires N levels of subsamplings, and so for reversibility, it is necessary that 2^N divides all the dimensions of each component. So if this condition is not met, the input picture components are padded as they are read in, by edge values for best compression performance.

Previous: Transform coding architecture
Next: Parent-Child relationships