How libdav1d Tile-Level Multithreading Works

This article provides an in-depth look at how the libdav1d AV1 decoder implements tile-level multithreading to achieve high-speed video playback. It explores the relationship between AV1 tiles, thread allocation, task scheduling, and the synchronization mechanisms used to manage decoding dependencies across multiple CPU cores.

Understanding AV1 Tiles and Multithreading

In the AV1 video coding format, a single video frame can be divided into a grid of independent rectangular regions called tiles. Because tiles have independent entropy coding states and do not rely on pixels from neighboring tiles for spatial intra-prediction, they are highly suited for parallel processing.

The libdav1d decoder leverages this structure using Tile-Level Multithreading (TMT). This mechanism can run independently of, or in tandem with, Frame-Level Multithreading (FMT) to maximize CPU utilization, especially on systems with high core counts.

The Task Queue and Thread Allocation

libdav1d manages multithreading through a unified task-based architecture rather than assigning static threads to specific roles.

Synchronization and Dependency Management

While tiles are mostly independent, they are not completely isolated. Complete frame reconstruction requires synchronization, particularly during the post-processing stages. libdav1d handles this through structured dependency tracking:

Intra-Frame Dependencies

During the initial reconstruction phase (coefficient parsing and pixel prediction), tiles do not depend on each other. Threads can decode their assigned tiles at different speeds without waiting for neighboring tile progress.

Loop Filtering and CDEF (Post-Processing)

The primary challenge in tile-level multithreading occurs during the loop filter, CDEF (Constrained Directional Enhancement Filter), and loop restoration phases. These filters operate across tile boundaries to smooth out compression artifacts.

Combining Frame and Tile Multithreading

To achieve optimal performance, libdav1d allows tile-level and frame-level multithreading to work together.

If a system has multiple CPU cores, libdav1d can allocate threads to decode multiple frames concurrently (FMT), while simultaneously assigning multiple threads to decode the individual tiles within each of those frames (TMT). This hybrid approach prevents thread starvation and ensures that both low-resolution, multi-tile streams and high-resolution, single-tile streams are decoded efficiently.