Motion estimation and compensation is the most complex
part of any video codec, both conceptually and in terms of
computation. The Dirac encoder uses three types of picture. Intra
pictures (I pictures) are coded without reference to other
pictures in the sequence. Level 1 pictures (L1 pictures) and
Level 2 pictures are both
inter pictures, that is they are coded with reference to
other previously coded pictures. The difference between L1 and L2 pictures is that
L1 pictures are forward-predicted only (also known a P-pictures) whereas
L2 pictures are B pictures (predicted from both earlier and later references).
The Dirac software employs a picture buffer
to manage temporal prediction. Each picture is encoded with a header
that specifies the picture number in display order, the picture numbers of
any references and how long the picture must stay in the buffer. The decoder
then decodes each picture as it arrives, searching the buffer for the
appropriate reference pictures and placing the picture in the buffer. The
decoder maintains a counter indicating which picture to 'display' (i.e. push out
through the picture IO to the application calling the decoder functions,
which may be a video player or may be something else). It searches the
buffer for the picture with that picture number and displays it. Finally,
it goes through the buffer eliminating pictures which have expired.
This decoder process allows for quite arbitrary prediction structures
to be employed, not just those of MPEG-like GOPs.
Nevertheless, the encoder
operates with standard GOP modes whereby the number of L1 pictures between
I pictures, and the separation between L1 pictures, can be specified; and various
presets for streaming, SDTV and HDTV imply specific GOP structures.
A prediction structure for picture coding
using a standard GOP structure is shown below:
Figure: Prediction of L1 and L2 pictures when L1 pictures
are P pictures.
The picture buffer structure gives great
flexibility, including the ability for the decoder to decode
dynamically-varying GOP structures. However, it also brings some dangers,
since at least in theory it means that I pictures need not be random
access points - that is points where a decoder may start decoding.
This is because it is perfectly possible for a subsequent
L1 or L2 picture to have as a reference a picture that temporally precedes
a preceding I picture, and indeed forms part of a chain of reference right
back to the start of the sequence. This will need to be constrained by specifying levels
and profiles.
I-picture only coding
Setting the number of L1 pictures to be 0 on the encoder side implies that we don't have
a GOP, and that we're doing I-picture only coding. I-picture only coding is useful for
editing and other applications where fast random access to all pictures is required, but
I-picture only coding is not essential for these applications.
with suitable support.
Skipping pictures and global motion
The picture header also contains other goodies, which are not yet fully supported.
Firstly it contains a flag indicating
whether or not the picture is skipped or not. In this case no picture data is sent at all.
What action the decoder takes in this case has yet to be determined, but it is likely
that in future versions the decoder will return the most recent decoded picture in temporal order.
The second flag the picture header contains indicates the presence of global motion data,
that is a parameterized model of the motion data. Exactly what this means and how the
data should be encoded and how the decoder should behave, have yet to be determined either.
When suitable algorithms have been implemented on the encoder side, both these
tools should have a powerful impact on compression performance, allowing the picture rate
to be scaled down and the motion more heavily compressed, when the encoder is very
pushed for bit rate.
Interlace coding
Dirac supports interlace coding by coding sequences of fields, rather than frames.