How libdav1d Implements Frame-Level Multithreading

This article explains how the libdav1d AV1 decoder implements frame-level multithreading to achieve industry-leading decoding speeds. It covers the core challenges of parallel frame decoding, the synchronization mechanisms used to manage inter-frame dependencies, and how tasks are scheduled across multiple CPU cores to maximize efficiency.

The Challenge of Frame-Level Parallelism in AV1

In video compression, most frames are “inter-frames” (P-frames or B-frames) that rely on previously decoded “reference frames” for motion compensation. This creates a temporal dependency chain.

If a decoder waited for Frame 1 to be completely finished before starting Frame 2, multi-core CPUs would remain largely underutilized. To solve this, libdav1d implements Frame-level Multithreading (Frame-MT), which decodes multiple frames simultaneously. The main challenge of Frame-MT is ensuring that a thread decoding Frame 2 does not attempt to read pixels or motion vectors from Frame 1 before Frame 1’s thread has actually decoded those specific areas.

Progress Signaling and Dependency Tracking

Rather than waiting for an entire reference frame to finish decoding, libdav1d uses a fine-grained synchronization mechanism based on row-level progress signaling.

Superblock Row Progress: As a thread decodes a frame, it tracks its progress in units of superblock rows (usually 64x64 or 128x128 pixel structures).
Atomic Status Updates: When a thread finishes decoding a superblock row, it atomically updates a progress marker associated with that frame.
Dependency Checking: When a thread decoding a dependent frame needs to perform motion compensation, it checks the motion vectors to see which area of the reference frame is required. It then queries the reference frame’s progress marker.
Yield and Wait: If the required row in the reference frame is already decoded, the dependent thread proceeds immediately. If not, the dependent thread suspends its current task and waits (using condition variables or lightweight polling) until the reference frame’s thread signals that the required row is complete.

This pipelined approach allows Frame 2 to begin decoding just a few milliseconds after Frame 1 starts, keeping all CPU cores busy.

The Task-Based Thread Pool

libdav1d does not assign a single static thread to a single frame. Instead, it uses a unified, task-based thread pool. This prevents load imbalance where one complex frame stalls the entire pipeline.

The decoding process of a single frame is broken down into discrete tasks: * Bitstream Parsing (Symbol Decoding): Parsing the compressed CDF (Context Multi-Probability) tokens. * Reconstruction: Performing inverse transforms and intra/inter prediction. * In-loop Filtering: Applying the deblocking filter, CDEF (Constrained Directional Enhancement Filter), and Loop Restoration.

These tasks are placed into a global priority queue. Workers in the thread pool dynamically pull tasks from this queue. If a thread decoding Frame 2 is blocked waiting for Frame 1 to progress, it does not sit idle. Instead, it returns to the thread pool to work on another available task—such as parsing the bitstream for Frame 3 or performing loop filtering on Frame 0.

Interplay with Tile-Level Multithreading

AV1 streams can be divided spatially into “tiles.” libdav1d seamlessly combines both tile-level multithreading (Tile-MT) and frame-level multithreading (Frame-MT).

When a video contains multiple tiles, libdav1d can distribute different tiles of the same frame to different threads, while simultaneously decoding other frames on separate threads. The task scheduler dynamically balances these tasks based on the number of available CPU cores, ensuring optimal scaling whether running on a dual-core mobile processor or a 64-core workstation.