Dirac Home
Navigation item arrowHome
Navigation item arrowDocumentation
Navigation item arrowDirac Algorithm
Navigation item arrowContents
Navigation item arrowIntroduction
Navigation item arrowArchitecture
Navigation item arrowRDO
Navigation item arrowTransform coding
Navigation item arrowMotion estimation
Navigation item arrowMacroblocks
Navigation item arrowMotion vector coding

SourceForge.net Logo
Valid XHTML 1.1!
Motion estimation

Previous: Overlapped block-based motion compensation
Next: RDO motion estimation metric

Motion estimation is specific to the encoder. It's always the most complicated part of the system, and can absorb huge system resources, so methods have to be found to produce short-cuts. Dirac adopts a 3-stage approach. In the first stage, motion vectors are found for every block and each reference to pixel accuracy using hierarchical motion estimation. In the second stage, these vectors are refined to sub-pixel accuracy. In the third stage, we do mode decision, which chooses which predictor to use, and how to aggregate motion vectors by grouping blocks with similar motion together.

Motion estimation is most accurate when all three components are involved, but this is more expensive in terms of computation as well as more complicated. Dirac only uses the luma (Y) component.

Hierarchical motion estimation

Hierarchical ME speeds things up by repeatedly downconverting both the current and the reference frame by a factor of two in both dimensions, and doing motion estimation on smaller pictures. At each stage of the hierarchy, vectors from lower levels (smaller versions of the picture) are used as a guide for searching at higher levels. This dramatically reduces the size of searches for large motions. Dirac has four levels of downconversion. The block size remains constant (and the blocks will still overlap at all resolutions) so that at each level there are only a quarter as many blocks and each block corresponds to 4 blocks at the next higher resolution; and so each block provides a guide motion vector to 4 blocks at the next higher resolution layer. At each resolution, block matching proceeds by searching in a small range around the guide vector for the best match using the RDO metric (which is described below).

Search strategies in hierarchical ME

The hierarchical approach dramatically reduces the computational effort involved in motion estimation for an equivalent search range. However it risks missing small motions and it might not make good decisions when there are a variety of motions near to each other.

To mitigate this, the codec also always uses the zero vector (0,0) as another guide vector - this allows it to track slow- as well as fast-moving objects. Finally, the motion vectors already found in neighbouring blocks can also be used as guide vectors, it they have not already been tried.

Since each layer has twice the horizontal and vertical resolution of the one below it, it would appear to make sense to just search in an area +/-1 pixel of the guide vectors. In fact,the search ranges are always larger than this because this could cause the motion estimator to get trapped in a local minimum.

Sub-pixel refinement and upconversion

Dirac supports variable levels of motion vector accuracy. In the software currently, these are hard-wired in the code at 1/4 pixel but 1/8 is possible with the current software and even higher resolutions could be defined. The MV precision is signalled with each frame.

Sub-pixel refinement operates hierarchically also. Once pixel-accurate motion vectors have been determined, each block will have an associated motion vector (V0,W0) where V0 and W0 are multiples of 4 (for quarter-pel accuracy) or 8 (for eighth-pel accuracy). 1/2-pel accurate vectors are found by finding the best match out of (V0,W0) and its 8 neighbours: (V0+4,W0+4), (V0,W0+4), (V0-4,W0+4), (V0+4,W0), (V0-4,W0), (V0+4,W0-4), (V0,W0-4), (V0-4,W0-4). This in turn produces a new best vector (V1,W1), which provides a guide for 1/4-pel refinement, and so on until the desired accuracy. The process is illustrated in the figure below.

Downloading a GIF rendering as your browser doesn't support SVG. Please ignore the "install additional plugins" message if you see it. More details Sorry, your browser can't connect to the server to download a GIF substitute.
        Either install an SVG-enabled browser or connect to the internet to download the diagram.

Figure: sub-pixel motion-vector refinement

The sub-pixel matching process is complicated slightly since the reference is only upconverted by a factor of 2 in each dimension, not 8, and so more accurate vectors require frame component values to be calculated on the fly by linear interpolation.This means that the 1/2-pel interpolation filter has a bit of pass-band boost to counteract the sag introduced by doing linear interpolation. It was designed to produce the lowest interpolation error across all the phases. The taps are (scaled to 5 bits):

( -1 , 3 , -7 , 21 , 21 , -7 , 3 , -1 )

Previous: Overlapped block-based motion compensation
Next: RDO motion estimation metric