Motion compensation in Dirac uses Overlapped Block-based
Motion Compensation (OBMC) to avoid block-edge artefacts which
would be expensive to code using wavelets.
Pretty much any size blocks can be used, with any degree of
overlap selected: this is configurable at the encoder and
transmitted to the decoder. One issue is that there
should be an exact number of macroblocks horizontally and vertically,
where a macroblock is a 4x4 set of blocks. This is achieved by padding
the data. Further padding may also be needed because after motion
compensation the wavelet transform
is applied, which has its own
requirements for divisibility.
Although Dirac is not specifically designed to be scalable,
the size of
blocks is the only non-scalable feature, and for lower
resolution frames, smaller blocks can easily be selected.
Dirac's OBMC scheme is based on a separable
linear ramp mask. This acts as a weight function on the
predicting block. Given a pixel p=p(x,y,t) in frame t, p may
fall within only one block or in up to four if it lies at the
corner of a block (see the figure below).
Figure: overlapping blocks. The darker-shades areas
show overlapping areas
Each block that the pixel p is part of has a predicting
block within the reference frame selected by motion
estimation. The predictor
for p is the
weighted sum of all the corresponding pixels in the predicting
blocks in frame
, given by
for motion vectors
. The
Raised-Cosine mask has the necessary property that the sum of
the weights will always be 1:
This may seem complicated but in implementation the only
additional complexity over standard block-based motion compensation
is to apply the weighting mask to a
predicting block before subtracting it from the frame. The fact that the weights sum to
1 automatically takes care of splicing the predictors together across the overlaps.
As explained elsewhere, Dirac provides
motion vectors to sub-pixel accuracy, the software supporting
accuracy up to 1/8th pixel, although 1/4 pixel is the default.
This means upconverting
the reference frame components by a factor of up to 8 in each dimension.
The area corresponding to the matching block in the upconverted
reference then consists of 64 times more points. These can be
thought of as 64 reference blocks on different sub-lattices
of points separated by a step of 8 'sub-'pixels, each one
corresponding to different sub-pixel offsets.
Sub-pixel motion compensation places a huge load on
memory bandwidth if done naively, i.e. by upconverting the reference by
a factor 8 in each dimension. In
Dirac, however, we just upconvert the reference by a factor of 2
in each dimension and compute the other offsets by linear
interpolation on the fly. In other words we throw the load
from the bus to the CPU. The 2x2 upconversion filter has been
designed to get the best prediction error across all the
possible sub-pixel offsets.