Dirac Home
Navigation item arrowHome
Navigation item arrowDocumentation
Navigation item arrowDirac Algorithm
Navigation item arrowContents
Navigation item arrowIntroduction
Navigation item arrowArchitecture
Navigation item arrowRDO
Navigation item arrowTransform coding
Navigation item arrowMotion estimation
Navigation item arrowMacroblocks
Navigation item arrowMotion vector coding

SourceForge.net Logo
Valid XHTML 1.1!
Rate-Distortion Optimisation

Previous: Overall Architecture
Next: Transform Coding

The key to making good decisions in compression is to be able to trade off the number of bits used to encode some part of the signal being compressed, with the error that is produced by using that number of bits. There is no point striving hard to compress one feature of the signal if the degradation it produces is much more significant than that of compressing some other feature with fewer bits. In other words, one wishes to distribute the bit rate to get the least possible distortion overall. So how can this be done?

Rate distortion can be described in terms of Lagrangian multipliers. It can also be described by the Principle of Equal Slopes, which states that the coding parameters should be selected so that the rate of change of distortion with respect to bit rate is the same for all parts of the system.

To see why this is so, consider two independent components of a signal. They might be different blocks in a video frame, or different subbands in a wavelet transform. Compress them at various rates using your favourite coding technique, and you tend to get curves like those in the figure below. They show that at low rates, there is high distortion (or error) and at high rates there is low distortion, and there is generally a smooth(ish) curve between these points with a nice convex shape.

Downloading a GIF rendering as your browser doesn't support SVG. Please ignore the "install additional plugins" message if you see it. More details Sorry, your browser can't connect to the server to download a GIF substitute.
        Either install an SVG-enabled browser or connect to the internet to download the diagram.

Figure: Rate-distortion curves for two signal components

Now suppose that we assign B1 bits to component X and B2 bits to component Y. Look at the slope of the rate-distortion curves at these points. At B1 the slope of X's distortion with respect to bit rate is much higher than the slope at B2, which measures the rate of change of Y's distortion with respect to bit rate. It's easy to see that this isn't the most efficient allocation of bits. To see this, increase B1 by a small amount to B1+Δ and decrease B2 to B2-Δ. Then the total distortion has reduced even though the total bit rate hasn't changed, due to the disproportionately greater drop in the distortion of X.

The conclusion is therefore that for a fixed total bit rate, the error or distortion is minimised by selecting bit rates for X and Y at which the rate-distortion curves have the same slope. Likewise, the problem can be reversed and for a fixed level of distortion, the total bitrate can be minimised by finding points with the same slope.

Two questions arise in practise: firstly, how does one find points on these curves with the same slope; and secondly, how does one hit a fixed overall bit budget? The first question can be answered by the succeeding figure: the intercept of the tangent to the rate-distortion curve at the point (R0,D0) to the D-axis is the value D0+λR0 where -λ is the slope at the point (R0,D0). Furthermore it is the smallest value of D+λR for all values of (R,D) that lie on the curve. So in selecting, for example, a quantizer in a given block or subband, one minimises the value D(Q)+λR(Q) over all quantizers Q, where D(Q) is the error produced by quantizing with Q and R(Q) is the rate implied.

Downloading a GIF rendering as your browser doesn't support SVG. Please ignore the "install additional plugins" message if you see it. More details Sorry, your browser can't connect to the server to download a GIF substitute.
        Either install an SVG-enabled browser or connect to the internet to download the diagram.

Figure: minimisation of the Lagrangian cost function

In order to hit an overall bit budget, one needs to iterate over values of the Lagrangian parameter λ in order to find the one that gives the right rate. In practise, this iteration can be done in slow time given any decent encoding buffer size, and by modelling the overall rate distortion curve based on the recent history of the encoder. Rate-distortion optimisation (RDO) is used throughout Dirac, and it has a very beneficial effect on performance. Control of the example Dirac encoder is by a single parameter ("-qf") that effectively sets Lagrangian parameters for each part of the encoding process.

This description makes RDO sound like a science: in fact it isn't and the reader will be pleased to learn that there is plenty of scope for engineering ad-hoc-ery of all kinds. This is because there are some practical problems in applying the procedure:

1) There may be no common measure of distortion. For example: quantising a high-frequency subband is less visually objectionable than quantising a low-frequency subband, in general. So there is no direct comparison with the significance of the distortion produced in one subband with that produced in another. This can be overcome by perceptual weighting, in which the noise in HF bands is downgraded according to an estimate of the Contrast Sensitivity Function (CSF) of the human eye, and this is what we have done. The problem even occurs in block-based coders, however, since quantisation noise can be successfully masked in some areas but not in others. Perceptual fudge factors are therefore necessary in RDO in all types of coders.

2) Rate and distortion may not be directly measurable. In practice, measuring rate and distortion for, say, every possible quantiser in a coding block or subband cannot mean actually encoding for every such quantiser and counting the bits and measuring MSE. What one can do is estimate the values using entropy calculations or assuming a statistical model and calculating, say, the variance. In this case, the R and D values may well be only roughly proportional to the true values, and some sort of factor to compensate is necessary in using a common multiplier across the encoder.

3) Components of the bitstream will be interdependent. The model describes a situation where the different signals X and Y are fully independent. This is often not true in a hybrid video codec. For example, the rate at which reference frames are encoded affects how noisy the prediction from them will be, and so the quantisation in predicted frames depends on that in the reference frame. Even if elements of the bitstream are logically independent, perceptually they might not be. For example, with Intra coding, each frame could be subject to RDO independently, but this might lead to objectionally large variations in quantisation noise between frames with low bit rates and rapidly changing content.

Incorporating motion estimation into RDO is also tricky, because motion parameters are not part of the content but have an indirect effect on how the content looks. They also have a coupled effect on the rest of the coding process, since the distortion measured by prediction error, say, affects both the bit rate needed to encode the residuals and the distortion remaining after coding. This is discussed in more detail later.

Previous: Overall Architecture
Next: Transform Coding