How libdav1d Tile-Level Multithreading Works
This article provides an in-depth look at how the libdav1d AV1 decoder implements tile-level multithreading to achieve high-speed video playback. It explores the relationship between AV1 tiles, thread allocation, task scheduling, and the synchronization mechanisms used to manage decoding dependencies across multiple CPU cores.
Understanding AV1 Tiles and Multithreading
In the AV1 video coding format, a single video frame can be divided into a grid of independent rectangular regions called tiles. Because tiles have independent entropy coding states and do not rely on pixels from neighboring tiles for spatial intra-prediction, they are highly suited for parallel processing.
The libdav1d decoder leverages this structure using Tile-Level Multithreading (TMT). This mechanism can run independently of, or in tandem with, Frame-Level Multithreading (FMT) to maximize CPU utilization, especially on systems with high core counts.
The Task Queue and Thread Allocation
libdav1d manages multithreading through a unified task-based architecture rather than assigning static threads to specific roles.
- The Thread Pool: A centralized pool of worker threads is created at decoder initialization.
- Task Generation: When a frame is submitted for decoding, libdav1d breaks the decoding process down into discrete, smaller tasks. Each tile within the frame represents an independent decoding task.
- Dynamic Scheduling: Worker threads continuously pull tasks from a shared queue. If a frame contains four tiles, four separate worker threads can decode these tiles simultaneously.
Synchronization and Dependency Management
While tiles are mostly independent, they are not completely isolated. Complete frame reconstruction requires synchronization, particularly during the post-processing stages. libdav1d handles this through structured dependency tracking:
Intra-Frame Dependencies
During the initial reconstruction phase (coefficient parsing and pixel prediction), tiles do not depend on each other. Threads can decode their assigned tiles at different speeds without waiting for neighboring tile progress.
Loop Filtering and CDEF (Post-Processing)
The primary challenge in tile-level multithreading occurs during the loop filter, CDEF (Constrained Directional Enhancement Filter), and loop restoration phases. These filters operate across tile boundaries to smooth out compression artifacts.
- Row-Based Synchronization: libdav1d uses a dependency-tracking mechanism where post-processing filters are executed in a pipelined fashion.
- Progress Indicators: As each tile thread finishes decoding a row of blocks, it updates a progress counter. The filtering threads (which may be the same threads shifting to a new task) check these progress indicators.
- Boundary Handling: A filter task for a boundary area is only initiated when the pixels on both sides of the tile boundary have been fully decoded and reconstructed.
Combining Frame and Tile Multithreading
To achieve optimal performance, libdav1d allows tile-level and frame-level multithreading to work together.
If a system has multiple CPU cores, libdav1d can allocate threads to decode multiple frames concurrently (FMT), while simultaneously assigning multiple threads to decode the individual tiles within each of those frames (TMT). This hybrid approach prevents thread starvation and ensures that both low-resolution, multi-tile streams and high-resolution, single-tile streams are decoded efficiently.