How libdav1d Manages Memory During AV1 Decoding

This article explores how the open-source AV1 decoder, libdav1d, manages memory allocations during the video decoding process. It examines the library’s use of custom frame allocators, internal picture pooling, thread-safe reference counting, and SIMD-aligned allocations to achieve high performance and low latency while preventing memory fragmentation.

Custom Memory Allocators

By default, libdav1d uses standard system memory allocation functions. However, to seamlessly integrate into diverse media pipelines (such as FFmpeg, VLC, or web browsers), it provides a custom allocator interface via the Dav1dPicAllocator structure.

Developers can define custom callbacks for allocating and releasing picture buffers. This mechanism allows the host application to allocate decoding buffers directly in specialized memory spaces, such as hardware-accelerated GPU memory or pre-allocated system memory heaps, minimizing CPU-to-GPU copy operations and reducing overhead.

Picture Pooling and Buffer Reuse

Video decoding requires large, contiguous blocks of memory to hold raw pixel data for multiple frames. Constantly allocating and freeing these buffers at runtime would cause severe heap fragmentation and CPU bottlenecking.

To solve this, libdav1d implements an internal picture pooling system. When a decoded frame is no longer needed for display or as a reference for decoding future frames, its memory buffer is not freed back to the operating system. Instead, it is returned to an idle pool. When the decoder needs a buffer for a newly decoded frame, it retrieves an appropriately sized buffer from this pool, eliminating the latency of standard system memory allocation.

Thread-Safe Reference Counting

Because AV1 utilizes complex inter-frame dependencies, and because libdav1d decodes video using multi-threading (both frame-threading and tile-threading), safe buffer management is critical.

libdav1d uses a thread-safe reference counting mechanism (Dav1dRef) to track buffer usage. Every time a frame buffer is assigned to a decoding thread or stored in the Reference Frame Buffer (DPB) for temporal prediction, its reference count increments. As threads finish processing and reference frames are invalidated, the count decrements. The memory is only marked as idle and returned to the picture pool when the reference count drops to zero.

SIMD-Aligned Memory Allocations

To achieve high-speed software decoding, libdav1d relies heavily on assembly-optimized SIMD (Single Instruction, Multiple Data) instructions, such as AVX2, AVX-512, and ARM NEON.

These vector instructions require data to be aligned to specific byte boundaries—typically 32-byte or 64-byte alignments. libdav1d guarantees strict alignment for all decoded image planes and line strides. If a custom allocator is not used, libdav1d utilizes platform-specific aligned allocation functions (such as posix_memalign or _aligned_malloc) to prevent the severe performance penalties associated with unaligned memory access.