How libdav1d Decodes Invisible Reference Frames
This article explores how the libdav1d AV1 decoder
processes, manages, and decodes invisible frames that serve as reference
frames. It covers the structural identification of these frames in the
AV1 bitstream, memory allocation within the Decoded Picture Buffer
(DPB), and how libdav1d optimizes its multi-threaded
architecture to decode these non-displayed frames without introducing
playback latency.
Understanding Invisible Frames in AV1
In the AV1 video coding format, invisible frames (such as Alternative Reference frames, or ALTREF) are decoded but not immediately displayed to the user. Instead, they serve as high-quality temporal predictors for subsequent visible frames. Because these frames contain crucial motion and texture data, a decoder must process them with the same precision as visible frames while bypassing the final display queue.
Identification and Header Parsing
libdav1d begins the decoding process by parsing the AV1
frame header. The decoder identifies an invisible frame by evaluating
specific syntax elements:
show_frameflag: If this flag is set to0, the frame is marked as invisible and will not be output to the display queue immediately after decoding.show_existing_frameflag: If this flag is set to1, the decoder does not decode a new frame but instead displays a previously decoded invisible frame stored in the reference buffer.
By analyzing these flags, libdav1d determines whether to
route the decoded output directly to the Decoded Picture Buffer (DPB) or
to send it to both the DPB and the display pipeline.
Buffer Management and DPB Storage
To handle invisible reference frames, libdav1d utilizes
a robust Decoded Picture Buffer (DPB) management system.
When an invisible frame is parsed, the decoder allocates a picture
buffer (Dav1dPicture). It then decodes the frame data into
this buffer. Instead of passing this buffer to the application’s output
queue, libdav1d stores it in one of the eight reference
picture slots defined by the AV1 specification. The buffer is held in
memory with an active reference count. It remains allocated until
subsequent frames no longer reference it, at which point
libdav1d safely releases the memory.
Threading and Execution Pipeline
One of libdav1d’s primary strengths is its highly
parallelized architecture, which utilizes both frame-level and
tile/row-level threading. Invisible frames are integrated into this
pipeline through the following mechanisms:
- Asynchronous Decoding:
libdav1dschedules the decoding of invisible frames on worker threads just like standard frames. Because there is no pressure to immediately present the frame to the display, the decoder can prioritize worker threads based on dependency chains. - Dependency Tracking: Visible frames that rely on an
invisible reference frame cannot finish decoding until the reference
frame’s pixels are ready.
libdav1duses fine-grained synchronization (progress tracking) to allow dependent frames to begin decoding rows as soon as the corresponding reference rows in the invisible frame are completed, rather than waiting for the entire invisible frame to finish. - Bypassing the Output Queue: Once the decoding of an invisible frame is complete, the thread marks the picture as “restricted” or “reference-only” and updates the reference status. The frame bypasses the external picture output queue, ensuring the media player or application does not receive a blank or out-of-order frame.