The performance of motion-estimation and motion-vector
coding is absolutely critical to the performance of a video
coding scheme.With motion vectors at 1/4
or 1/8th pixel accuracy, a simple-minded strategy of finding
the best match between frames can greatly inflate the
resulting bitrate for little or no gain in quality because the
additional accuracy is very sensitive to noise. What is
required is the ability to trade off the vector bitrate with
prediction accuracy and hence the bit rate required to code the
residual frame and the eventual quality of that frame, whilst
at the same time making the estimator more robust.
The simplest way to do this is to incorporate a smoothing
factor into the metric used for matching blocks. So the metric
consists of a basic block matching metric, plus some constant
times a measure of the local motion vector smoothness. The
basic block matching metric used by Dirac is Sum of Absolute
Differences (SAD). Given two blocks X,Y of samples, this
is given by:
The smoothness measure used is based on the difference between
the candidate motion vector and the median of the
neighbouring previously
computed motion vectors.
Since the blocks are estimated in
raster-scan order then vectors for blocks to the left and above
are available for calculating the median:
Figure: neighbouring vectors available in raster-scan
order for local variance calculation
The vectors chosen for computing the local median
predictor are V2, V3
and V4; this has the merit of being
the same predictor as is used in
coding the motion vectors.
The total metric is a combination of these two metrics. Given a
vector V which maps the current frame block X to a block
Y=V(X) in the reference frame, the metric is given by:
The value λ is a coding parameter used to control the
trade-off between the smoothness of the motion vector field and
the accuracy of the match. When λ is very large, the
local variance dominates the calculation and the motion vector
which gives the smallest metric is simply that which is closest
to its neighbours. When λ is very small, the metric is
dominated by the SAD term, and so the best vector will simply
be that which gives the best match for that block. For values
in between, varying degrees of smoothness can be achieved. The
parameter λ is calculated as a multiple of the RDO
parameters for the L1 and
L2 frames, so that if the inter frames
are compressed more heavily then smoother motion vector fields
will also result.
The limit on the size of the smoothing factor is to 48 1/8ths
of a pixel. This prevents the
motion field being smoothed excessively, since where there is a genuine
motion transition the motion vector will legitimately differ
from its neighbours and shouldn't be excessively penalised.