How libdav1d Extracts and Provides HDR Metadata

This article explains the technical process by which the open-source AV1 decoder, libdav1d, parses High Dynamic Range (HDR) metadata from an AV1 video bitstream and delivers it to host applications. It covers the specific Open Bitstream Units (OBUs) involved, how the library extracts mastering display and light level information, and the API structures used to pass this critical color data to media players and video processors for accurate rendering.

AV1 Bitstream and HDR Metadata OBUs

The AV1 bitstream is structured into packets called Open Bitstream Units (OBUs). HDR metadata is not stored in the compressed pixel data itself; instead, it is carried within specific metadata OBUs. libdav1d parses these units during the demuxing and decoding pipeline.

There are two primary types of HDR metadata OBUs that libdav1d looks for: * Mastering Display Color Volume (MDCV): This metadata describes the color primaries, white point, and luminance range of the mastering display used to author the video content. * Content Light Level (CLL): This metadata defines the maximum content light level (MaxCLL) and maximum frame-average light level (MaxFALL) of the video.

Additionally, basic color description parameters—such as color primaries, transfer characteristics (e.g., PQ or HLG), and matrix coefficients—are extracted from the Sequence Header OBU.

Parsing and Extraction Process

During the decoding process, libdav1d’s bitstream parser processes the input packets sequentially. When it encounters a metadata_obu(), it reads the metadata_type header to determine how to decode the payload.

Parsing MDCV: If the metadata type is set to HDR Mastering Display Color Volume, libdav1d parses the three-channel chromaticity coordinates (red, green, blue), the white point coordinates, and the maximum and minimum display luminance.
Parsing CLL: If the metadata type is set to HDR Content Light Level, libdav1d extracts the MaxCLL and MaxFALL values, represented in candelas per square meter (cd/m² or nits).
Validation: The parser validates these values to ensure they conform to expected ranges before associating them with the decoded frame.

Exposing Metadata to the Host Application

Once libdav1d extracts the HDR metadata, it attaches it directly to the picture container. This ensures that the metadata remains synchronized with the corresponding decoded video frame.

The host application retrieves this information through the Dav1dPicture structure. Libdav1d defines specific, dedicated structures within its public API to hold this metadata:

Dav1dMasteringDisplay: Contains arrays for primary coordinates, white point, and the minimum/maximum luminance values.
Dav1dContentLightLevel: Contains the integer values for MaxCLL and MaxFALL.

These structures are exposed as pointers within the master Dav1dPicture struct. When the host application calls dav1d_get_picture(), it receives the picture struct and checks if these pointers are non-null.

If the pointers contain data, the host application (such as FFmpeg, VLC, or a custom media player) extracts the values and forwards them to the system’s display renderer, video filter, or operating system HDR API (such as DXGI on Windows or Metal on macOS) to perform accurate tone mapping and color reproduction.