How libdav1d Decodes Invisible Reference Frames

This article explores how the libdav1d AV1 decoder processes, manages, and decodes invisible frames that serve as reference frames. It covers the structural identification of these frames in the AV1 bitstream, memory allocation within the Decoded Picture Buffer (DPB), and how libdav1d optimizes its multi-threaded architecture to decode these non-displayed frames without introducing playback latency.

Understanding Invisible Frames in AV1

In the AV1 video coding format, invisible frames (such as Alternative Reference frames, or ALTREF) are decoded but not immediately displayed to the user. Instead, they serve as high-quality temporal predictors for subsequent visible frames. Because these frames contain crucial motion and texture data, a decoder must process them with the same precision as visible frames while bypassing the final display queue.

Identification and Header Parsing

libdav1d begins the decoding process by parsing the AV1 frame header. The decoder identifies an invisible frame by evaluating specific syntax elements:

show_frame flag: If this flag is set to 0, the frame is marked as invisible and will not be output to the display queue immediately after decoding.
show_existing_frame flag: If this flag is set to 1, the decoder does not decode a new frame but instead displays a previously decoded invisible frame stored in the reference buffer.

By analyzing these flags, libdav1d determines whether to route the decoded output directly to the Decoded Picture Buffer (DPB) or to send it to both the DPB and the display pipeline.

Buffer Management and DPB Storage

To handle invisible reference frames, libdav1d utilizes a robust Decoded Picture Buffer (DPB) management system.

When an invisible frame is parsed, the decoder allocates a picture buffer (Dav1dPicture). It then decodes the frame data into this buffer. Instead of passing this buffer to the application’s output queue, libdav1d stores it in one of the eight reference picture slots defined by the AV1 specification. The buffer is held in memory with an active reference count. It remains allocated until subsequent frames no longer reference it, at which point libdav1d safely releases the memory.

Threading and Execution Pipeline

One of libdav1d’s primary strengths is its highly parallelized architecture, which utilizes both frame-level and tile/row-level threading. Invisible frames are integrated into this pipeline through the following mechanisms:

Asynchronous Decoding: libdav1d schedules the decoding of invisible frames on worker threads just like standard frames. Because there is no pressure to immediately present the frame to the display, the decoder can prioritize worker threads based on dependency chains.
Dependency Tracking: Visible frames that rely on an invisible reference frame cannot finish decoding until the reference frame’s pixels are ready. libdav1d uses fine-grained synchronization (progress tracking) to allow dependent frames to begin decoding rows as soon as the corresponding reference rows in the invisible frame are completed, rather than waiting for the entire invisible frame to finish.
Bypassing the Output Queue: Once the decoding of an invisible frame is complete, the thread marks the picture as “restricted” or “reference-only” and updates the reference status. The frame bypasses the external picture output queue, ensuring the media player or application does not receive a blank or out-of-order frame.