Compiling libdav1d with Optimization Levels

This article examines how different compiler optimization levels affect the performance, binary size, and decoding speed of libdav1d, the popular open-source AV1 video decoder. We will explore the trade-offs between standard compiler flags (such as -O0, -O2, -O3, and -Ofast) and analyze how libdav1d’s extensive use of hand-written assembly code influences these optimization outcomes.

Understanding libdav1d’s Architecture

To understand the impact of compiler optimizations on libdav1d, it is vital to look at how the decoder is built. Unlike many standard software libraries that rely purely on C or C++ code, libdav1d is highly optimized using hand-written assembly language for targeting specific CPU architectures, including x86 (AVX2, AVX-512, SSSE3) and ARM (NEON).

Because the most computationally heavy tasks (like IDCT, motion compensation, and loop filtering) are written in assembly, compiler optimization flags primarily affect the C fallback paths, the control flow logic, and the glue code that connects the assembly modules.

The Impact of Different Optimization Levels

When compiling libdav1d using compilers like GCC or Clang, the optimization flag you choose directly dictates how the compiler translates the C portion of the codebase.

-O0 (No Optimization)

-O1 (Basic Optimization)

-O2 (Standard Optimization)

-O3 (Aggressive Optimization)

-Ofast (Non-Standard Aggressive Optimization)

Why Assembly Limits Compiler Optimization Impact

On modern x86_64 or ARM64 processors, the performance delta between an -O2 build and an -O3 build of libdav1d is relatively small (often within 1% to 5%). This is because the execution hot paths bypass the C compiler entirely, utilizing the pre-compiled assembly code instead.

However, the compiler optimization level becomes critical under the following conditions: 1. Unsupported Architectures: If you are running libdav1d on an architecture without dedicated assembly optimizations (such as RISC-V or older MIPS processors), the decoder must rely entirely on the C codebase. In this scenario, upgrading from -O2 to -O3 can yield double-digit performance improvements. 2. Assembly Disabled: If assembly is manually disabled during compilation (e.g., using the -Denable_asm=false Meson option), the compiler’s auto-vectorization flags in -O3 are required to achieve acceptable decoding speeds.

Summary Recommendation

For the vast majority of deployments on x86 and ARM platforms, compiling libdav1d with -O3 combined with native architecture targeting (-march=native or -mcpu=native) delivers the best possible performance. If binary size is a constraint (such as in embedded systems), -O2 provides a reliable, highly optimized alternative with minimal performance loss.