libdav1d Assembly Optimizations on Windows and Linux
This article explores how the libdav1d AV1 decoder achieves cross-platform performance by managing assembly optimizations across different operating systems like Windows and Linux. It covers the use of unified assemblers, the abstraction of platform-specific Application Binary Interfaces (ABIs), the role of the Meson build system, and how runtime CPU detection ensures the correct instruction sets are executed on different hardware.
Unified Assembler Strategy
To avoid maintaining separate assembly codebases for every operating system, libdav1d relies on highly portable assemblers.
- x86 and x86_64 Platforms: libdav1d uses NASM (Netwide Assembler). NASM allows developers to write x86 assembly code once and compile it into various target object formats, such as ELF (for Linux) and COFF/PE (for Windows).
- ARM and ARM64 Platforms: The decoder utilizes GAS (GNU Assembler) syntax. Because modern compilers on both Linux (GCC/Clang) and Windows (Clang-cl) support GAS syntax, ARM NEON assembly remains highly portable across systems.
Bridging ABI Differences with Macros
The primary challenge of running the same assembly code on Windows and Linux is the difference in their Application Binary Interfaces (ABIs). On x86_64, Windows uses the Microsoft x64 calling convention, while Linux uses the System V AMD64 ABI. These systems differ in how they pass function arguments and which CPU registers they require the function to preserve.
To solve this, libdav1d utilizes a sophisticated macro layer, heavily
inspired by the x86inc.asm file originally developed by the
x264 and FFmpeg projects. This assembly header file defines wrapper
macros for function entry and exit.
When the assembly is compiled: * The macros detect the target
operating system. * They automatically map the virtual arguments (e.g.,
parameter 1, parameter 2) to the correct
physical registers according to the active ABI (such as RCX
on Windows or RDI on Linux). * The macros automatically
generate the correct prologue and epilogue code to back up and restore
volatile registers (like the XMM6-XMM15
registers, which must be preserved on Windows but not on Linux).
Build System Integration
The Meson build system handles the platform-specific configuration before compilation begins. Meson probes the host environment to identify the compiler, the operating system, and the availability of assemblers like NASM.
Once identified, Meson passes specific preprocessor flags to the
compiler and assembler (such as ARCH_X86_64=1 and
ARCH_X86_32=0). If a platform lacks a compatible assembler,
Meson can gracefully disable assembly optimizations and fall back to
standard C code, ensuring the library still builds and runs, albeit at
lower performance.
Runtime CPU Feature Detection
Operating systems handle hardware queries differently. To ensure that optimizations like AVX2, AVX-512, or ARM NEON are only used when the processor supports them, libdav1d employs platform-specific runtime detection:
- On x86/x64 (Windows and Linux): libdav1d uses the
inline
cpuidinstruction, which behaves virtually identically across both operating systems, to query instruction set support directly from the hardware. - On ARM/ARM64: Since user-space access to CPU
registers is restricted on ARM, libdav1d queries the host OS. On Linux,
it utilizes the
getauxval()system call to check auxiliary vectors. On Windows, it calls the Win32 API functionIsProcessorFeaturePresent().
By combining uniform assembly dialects, ABI-translating macros, automated build configurations, and OS-specific runtime checks, libdav1d achieves near-identical high performance on both Windows and Linux without code duplication.