How libdav1d Implements Frame-Level Multithreading
This article explains how the libdav1d AV1 decoder implements frame-level multithreading to achieve industry-leading decoding speeds. It covers the core challenges of parallel frame decoding, the synchronization mechanisms used to manage inter-frame dependencies, and how tasks are scheduled across multiple CPU cores to maximize efficiency.
The Challenge of Frame-Level Parallelism in AV1
In video compression, most frames are “inter-frames” (P-frames or B-frames) that rely on previously decoded “reference frames” for motion compensation. This creates a temporal dependency chain.
If a decoder waited for Frame 1 to be completely finished before
starting Frame 2, multi-core CPUs would remain largely underutilized. To
solve this, libdav1d implements Frame-level Multithreading
(Frame-MT), which decodes multiple frames simultaneously. The main
challenge of Frame-MT is ensuring that a thread decoding Frame 2 does
not attempt to read pixels or motion vectors from Frame 1 before Frame
1’s thread has actually decoded those specific areas.
Progress Signaling and Dependency Tracking
Rather than waiting for an entire reference frame to finish decoding,
libdav1d uses a fine-grained synchronization mechanism
based on row-level progress signaling.
- Superblock Row Progress: As a thread decodes a frame, it tracks its progress in units of superblock rows (usually 64x64 or 128x128 pixel structures).
- Atomic Status Updates: When a thread finishes decoding a superblock row, it atomically updates a progress marker associated with that frame.
- Dependency Checking: When a thread decoding a dependent frame needs to perform motion compensation, it checks the motion vectors to see which area of the reference frame is required. It then queries the reference frame’s progress marker.
- Yield and Wait: If the required row in the reference frame is already decoded, the dependent thread proceeds immediately. If not, the dependent thread suspends its current task and waits (using condition variables or lightweight polling) until the reference frame’s thread signals that the required row is complete.
This pipelined approach allows Frame 2 to begin decoding just a few milliseconds after Frame 1 starts, keeping all CPU cores busy.
The Task-Based Thread Pool
libdav1d does not assign a single static thread to a
single frame. Instead, it uses a unified, task-based thread pool. This
prevents load imbalance where one complex frame stalls the entire
pipeline.
The decoding process of a single frame is broken down into discrete tasks: * Bitstream Parsing (Symbol Decoding): Parsing the compressed CDF (Context Multi-Probability) tokens. * Reconstruction: Performing inverse transforms and intra/inter prediction. * In-loop Filtering: Applying the deblocking filter, CDEF (Constrained Directional Enhancement Filter), and Loop Restoration.
These tasks are placed into a global priority queue. Workers in the thread pool dynamically pull tasks from this queue. If a thread decoding Frame 2 is blocked waiting for Frame 1 to progress, it does not sit idle. Instead, it returns to the thread pool to work on another available task—such as parsing the bitstream for Frame 3 or performing loop filtering on Frame 0.
Interplay with Tile-Level Multithreading
AV1 streams can be divided spatially into “tiles.”
libdav1d seamlessly combines both tile-level multithreading
(Tile-MT) and frame-level multithreading (Frame-MT).
When a video contains multiple tiles, libdav1d can
distribute different tiles of the same frame to different
threads, while simultaneously decoding other frames on separate
threads. The task scheduler dynamically balances these tasks based on
the number of available CPU cores, ensuring optimal scaling whether
running on a dual-core mobile processor or a 64-core workstation.