How libdav1d Handles AV1 Super-Resolution
This article provides a technical overview of how the
libdav1d AV1 decoder internally processes the AV1
Super-Resolution feature. It explains the decoding pipeline sequence,
how horizontal-only scaling is integrated with loop filters, and the
assembly-level optimizations used to achieve high-performance real-time
playback.
The AV1 Super-Resolution Concept
AV1 Super-Resolution allows a frame to be encoded at a lower horizontal resolution (the coded width) and then scaled back to its original horizontal size (the upscaled width) during the decoding process. This technique reduces the bitrate by processing fewer pixels during entropy decoding and early loop filtering, while still outputting a full-resolution image. Unlike standard post-processing scalers, Super-Resolution in AV1 is normatively integrated directly into the decoding loop.
Inside the libdav1d Decoding Pipeline
In libdav1d, the Super-Resolution process is carefully
positioned within the frame reconstruction pipeline. It does not occur
at the very end of decoding; instead, it is executed sequentially
between specific loop-filtering stages:
- Coded-Resolution Stages: The frame is reconstructed, and both the Deblocking Filter (LF) and the Constrained Directional Enhancement Filter (CDEF) are applied to the frame at its smaller, coded resolution. This reduces computational overhead because these filters run on fewer pixels.
- Horizontal Scaling (Super-Resolution): Once CDEF is
complete,
libdav1dapplies the Super-Resolution process. It scales the frame horizontally using a normative 8-tap polyphase filter. - Upscaled-Resolution Stages: After the frame is horizontally upscaled, Loop Restoration (LR) is applied at the full, upscaled resolution. Finally, if film grain is present in the bitstream, Film Grain Synthesis is applied to the upscaled image.
Memory Management and Threading
libdav1d is designed for massive multi-threading,
utilizing both frame-level and tile/row-level threading. Managing
Super-Resolution internally requires sophisticated buffer handling:
- Line Buffers: Because Loop Restoration runs at a
different resolution than CDEF,
libdav1dallocates specialized line buffers to hold the boundaries of the scaled and unscaled rows. - Synchronization: The row-processing threads must synchronize. A thread cannot begin the Loop Restoration phase on an upscaled row until the Super-Resolution filter has finished processing the corresponding CDEF-filtered row.
- On-the-Fly Scaling: To optimize cache locality,
libdav1davoids writing the intermediate CDEF output back to main memory only to read it again for scaling. Instead, it attempts to scale the rows in-cache and feed them directly into the Loop Restoration engine.
SIMD and Assembly Optimizations
The 8-tap horizontal scaling filter is mathematically intensive. To
prevent Super-Resolution from becoming a bottleneck, the
libdav1d codebase features hand-written assembly
optimizations for various hardware architectures:
- x86 Platforms: Highly optimized implementations using SSSE3, AVX2, and AVX-512 vector instructions. These assembly routines vectorize the 8-tap filter coefficients, processing multiple horizontal pixels in parallel.
- ARM Platforms: Dedicated NEON assembly implementations designed for mobile and embedded processors, ensuring efficient register usage and memory alignment.
By leveraging these hardware-specific optimizations and integrating
the scaling directly into the row-threading pipeline,
libdav1d minimizes the CPU and memory bandwidth overhead
associated with AV1 Super-Resolution.