libdav1d Dynamic Threading and CPU Core Scaling
This article explores how the libdav1d AV1 decoder manages its threading model in relation to system CPU cores. It explains the mechanism behind libdav1d’s automatic thread allocation, distinguishes between frame and tile threading, and clarifies whether the library can adjust its thread pool dynamically during runtime.
Automatic CPU Core Detection and Thread Allocation
By default, libdav1d optimizes its performance by automatically
detecting the number of logical CPU cores available on the host system.
When configuring the decoder, developers can set the thread count
parameters to 0.
When these parameters are set to zero, libdav1d queries the operating system at startup to determine the hardware capability and automatically instantiates an optimal number of threads. This ensures that the decoder scales out of the box on everything from low-power dual-core mobile processors to high-end multi-core desktop and server CPUs.
The Dual-Threading Model: Frames and Tiles
To achieve high-speed AV1 decoding, libdav1d utilizes a hybrid threading model consisting of two distinct types of threading:
- Frame Threading: Decodes multiple video frames in parallel. This method offers the highest throughput and CPU utilization but introduces slight latency and increases memory consumption.
- Tile Threading: Decodes different tiles within a single video frame simultaneously. This method reduces latency but is dependent on the video stream being encoded with multiple tiles.
When automatic threading is enabled, libdav1d balances these two methods. It allocates a specific ratio of frame and tile threads tailored to the detected CPU core count to maximize decoding efficiency without exhausting system memory.
Is Threading Dynamically Adjusted at Runtime?
While libdav1d dynamically scales its thread pool at initialization based on the available CPU cores, it does not dynamically resize the thread pool during active decoding if system conditions or CPU topologies change (such as CPU hotplugging or changing CPU affinity mid-stream).
Once the decoder context is initialized, the size of the thread pool remains static. However, libdav1d features a highly dynamic work-stealing scheduler. This internal scheduler continuously and dynamically distributes decoding tasks (such as symbol decoding, motion compensation, and loop filtering) across the pre-allocated thread pool. This ensures that even though the thread count is fixed, CPU utilization remains balanced and efficient throughout the playback session.
Configuration for Developers
For developers integrating libdav1d, automatic thread scaling is
controlled via the Dav1dSettings structure.
To enable automatic CPU-based scaling, the settings should be configured as follows:
Dav1dSettings settings;
dav1d_default_settings(&settings);
// Setting these to 0 enables automatic thread allocation based on CPU cores
settings.n_frame_threads = 0;
settings.n_tile_threads = 0;
// Initialize the decoder context with these settings
Dav1dContext *c;
dav1d_open(&c, &settings);By allowing libdav1d to manage these values, applications ensure the best possible AV1 decoding performance across diverse hardware configurations without manual profiling.