How libdav1d Parses AV1 Sequence Header OBUs

This article explores how the open-source AV1 decoder libdav1d parses and processes Sequence Header Open Bitstream Units (OBUs). We will examine the role of the Sequence Header in the AV1 bitstream, detail the step-by-step parsing mechanism within the libdav1d codebase, and explain how the decoded configuration metadata is initialized to prepare the decoder for subsequent video frames.

What is a Sequence Header OBU?

In the AV1 video coding format, the bitstream is structured into self-contained packets called Open Bitstream Units (OBUs). The Sequence Header OBU (OBU_SEQ_HDR) is the most critical metadata packet in the stream. It contains global configuration parameters required to decode the entire video sequence, such as the profile, level, image resolution limits, color space, chroma subsampling, and which coding tools (like global motion or film grain) are enabled. A decoder cannot decode any pixel data until it has successfully parsed this header.

Step 1: OBU Identification and Routing

When libdav1d receives an AV1 bitstream, it processes the packet headers using its main demuxing and parsing engine.

OBU Header Parsing: The decoder reads the OBU header to determine the OBU type.
Routing: When the OBU type is identified as OBU_SEQ_HDR (value 1), libdav1d routes the payload to the internal sequence header parsing function, typically handled within src/parse.c via the dav1d_parse_seq_hdr function.

Step 2: Bitstream Reading with Dav1dBitContext

To extract the tightly packed binary data from the stream, libdav1d utilizes a highly optimized bit reader abstraction called Dav1dBitContext. This context tracks the current byte pointer, bit buffer, and bits left in the stream. The parser uses functions like get_bits and get_symbol to read variable-length and fixed-length integers representing the syntax elements defined by the Alliance for Open Media (AOM) AV1 specification.

Step 3: Parsing Key Syntax Elements

The parsing process inside dav1d_parse_seq_hdr follows the exact state machine of the AV1 specification, extracting parameters in a strict sequence:

Profile and Level: It first reads seq_profile and seq_level_idx to determine if the hardware or software context supports the stream’s complexity.
Frame Dimensions: The decoder parses the maximum frame width and height (max_frame_width_minus_1 and max_frame_height_minus_1). These values dictate the maximum memory allocation size for frame buffers.
Color Configuration: The parser calls a helper function (equivalent to the spec’s color_config()) to extract color depth (8, 10, or 12-bit), color primaries, transfer characteristics, matrix coefficients, and chroma sample position.
Coding Tool Enables: It reads flags that enable or disable specific coding features globally, such as enable_filter_intra, enable_intra_edge_filter, enable_interintra, enable_masked_compound, enable_warped_motion, and enable_order_hint. This allows libdav1d to bypass complex logic branches during frame decoding if these tools are not used in the sequence.

Step 4: State Initialization and Memory Allocation

Once the sequence header data is fully parsed, libdav1d stores the structured metadata in a internal Dav1dSequenceHeader structure.

This structure is then bound to the decoder’s global context (Dav1dContext). If this is the first sequence header parsed, or if critical parameters (like resolution, bit depth, or chroma format) have changed from a previous sequence header, libdav1d triggers a context re-allocation. It initializes the thread pool, sets up picture buffer pools, and prepares reference frame slots based on the newly acquired sequence limits. This ensures that when the subsequent Frame OBUs arrive, all necessary memory pipelines are fully allocated and ready for parallel decoding.