Dirac Home
Navigation item arrowHome
Navigation item arrowDocumentation
Navigation item arrowDirac Algorithm
Navigation item arrowContents
Navigation item arrowIntroduction
Navigation item arrowArchitecture
Navigation item arrowRDO
Navigation item arrowTransform coding
Navigation item arrowMotion estimation
Navigation item arrowMacroblocks
Navigation item arrowMotion vector coding

SourceForge.net Logo
Valid XHTML 1.1!
Temporal prediction structures

Previous: Motion estimation contents
Next: Overlapped block-based motion compensation

Motion estimation and compensation is the most complex part of any video codec, both conceptually and in terms of computation. The Dirac encoder uses three types of picture. Intra pictures (I pictures) are coded without reference to other pictures in the sequence. Level 1 pictures (L1 pictures) and Level 2 pictures are both inter pictures, that is they are coded with reference to other previously coded pictures. The difference between L1 and L2 pictures is that L1 pictures are forward-predicted only (also known a P-pictures) whereas L2 pictures are B pictures (predicted from both earlier and later references).

The Dirac software employs a picture buffer to manage temporal prediction. Each picture is encoded with a header that specifies the picture number in display order, the picture numbers of any references and how long the picture must stay in the buffer. The decoder then decodes each picture as it arrives, searching the buffer for the appropriate reference pictures and placing the picture in the buffer. The decoder maintains a counter indicating which picture to 'display' (i.e. push out through the picture IO to the application calling the decoder functions, which may be a video player or may be something else). It searches the buffer for the picture with that picture number and displays it. Finally, it goes through the buffer eliminating pictures which have expired.

This decoder process allows for quite arbitrary prediction structures to be employed, not just those of MPEG-like GOPs.

Nevertheless, the encoder operates with standard GOP modes whereby the number of L1 pictures between I pictures, and the separation between L1 pictures, can be specified; and various presets for streaming, SDTV and HDTV imply specific GOP structures. A prediction structure for picture coding using a standard GOP structure is shown below:

Downloading a GIF rendering as your browser doesn't support SVG. Please ignore the "install additional plugins" message if you see it. More details Sorry, your browser can't connect to the server to download a GIF substitute.
        Either install an SVG-enabled browser or connect to the internet to download the diagram.

Figure: Prediction of L1 and L2 pictures when L1 pictures are P pictures.

The picture buffer structure gives great flexibility, including the ability for the decoder to decode dynamically-varying GOP structures. However, it also brings some dangers, since at least in theory it means that I pictures need not be random access points - that is points where a decoder may start decoding. This is because it is perfectly possible for a subsequent L1 or L2 picture to have as a reference a picture that temporally precedes a preceding I picture, and indeed forms part of a chain of reference right back to the start of the sequence. This will need to be constrained by specifying levels and profiles.

I-picture only coding

Setting the number of L1 pictures to be 0 on the encoder side implies that we don't have a GOP, and that we're doing I-picture only coding. I-picture only coding is useful for editing and other applications where fast random access to all pictures is required, but I-picture only coding is not essential for these applications. with suitable support.

Skipping pictures and global motion

The picture header also contains other goodies, which are not yet fully supported. Firstly it contains a flag indicating whether or not the picture is skipped or not. In this case no picture data is sent at all. What action the decoder takes in this case has yet to be determined, but it is likely that in future versions the decoder will return the most recent decoded picture in temporal order.

The second flag the picture header contains indicates the presence of global motion data, that is a parameterized model of the motion data. Exactly what this means and how the data should be encoded and how the decoder should behave, have yet to be determined either.

When suitable algorithms have been implemented on the encoder side, both these tools should have a powerful impact on compression performance, allowing the picture rate to be scaled down and the motion more heavily compressed, when the encoder is very pushed for bit rate.

Interlace coding

Dirac supports interlace coding by coding sequences of fields, rather than frames.

Previous: Motion estimation contents
Next: Overlapped block-based motion compensation