libdav1d Assembly Optimizations on Windows and Linux

This article explores how the libdav1d AV1 decoder achieves cross-platform performance by managing assembly optimizations across different operating systems like Windows and Linux. It covers the use of unified assemblers, the abstraction of platform-specific Application Binary Interfaces (ABIs), the role of the Meson build system, and how runtime CPU detection ensures the correct instruction sets are executed on different hardware.

Unified Assembler Strategy

To avoid maintaining separate assembly codebases for every operating system, libdav1d relies on highly portable assemblers.

Bridging ABI Differences with Macros

The primary challenge of running the same assembly code on Windows and Linux is the difference in their Application Binary Interfaces (ABIs). On x86_64, Windows uses the Microsoft x64 calling convention, while Linux uses the System V AMD64 ABI. These systems differ in how they pass function arguments and which CPU registers they require the function to preserve.

To solve this, libdav1d utilizes a sophisticated macro layer, heavily inspired by the x86inc.asm file originally developed by the x264 and FFmpeg projects. This assembly header file defines wrapper macros for function entry and exit.

When the assembly is compiled: * The macros detect the target operating system. * They automatically map the virtual arguments (e.g., parameter 1, parameter 2) to the correct physical registers according to the active ABI (such as RCX on Windows or RDI on Linux). * The macros automatically generate the correct prologue and epilogue code to back up and restore volatile registers (like the XMM6-XMM15 registers, which must be preserved on Windows but not on Linux).

Build System Integration

The Meson build system handles the platform-specific configuration before compilation begins. Meson probes the host environment to identify the compiler, the operating system, and the availability of assemblers like NASM.

Once identified, Meson passes specific preprocessor flags to the compiler and assembler (such as ARCH_X86_64=1 and ARCH_X86_32=0). If a platform lacks a compatible assembler, Meson can gracefully disable assembly optimizations and fall back to standard C code, ensuring the library still builds and runs, albeit at lower performance.

Runtime CPU Feature Detection

Operating systems handle hardware queries differently. To ensure that optimizations like AVX2, AVX-512, or ARM NEON are only used when the processor supports them, libdav1d employs platform-specific runtime detection:

By combining uniform assembly dialects, ABI-translating macros, automated build configurations, and OS-specific runtime checks, libdav1d achieves near-identical high performance on both Windows and Linux without code duplication.