How libdav1d Parses AV1 Sequence Header OBUs
This article explores how the open-source AV1 decoder
libdav1d parses and processes Sequence Header Open
Bitstream Units (OBUs). We will examine the role of the Sequence Header
in the AV1 bitstream, detail the step-by-step parsing mechanism within
the libdav1d codebase, and explain how the decoded
configuration metadata is initialized to prepare the decoder for
subsequent video frames.
What is a Sequence Header OBU?
In the AV1 video coding format, the bitstream is structured into
self-contained packets called Open Bitstream Units (OBUs). The Sequence
Header OBU (OBU_SEQ_HDR) is the most critical metadata
packet in the stream. It contains global configuration parameters
required to decode the entire video sequence, such as the profile,
level, image resolution limits, color space, chroma subsampling, and
which coding tools (like global motion or film grain) are enabled. A
decoder cannot decode any pixel data until it has successfully parsed
this header.
Step 1: OBU Identification and Routing
When libdav1d receives an AV1 bitstream, it processes
the packet headers using its main demuxing and parsing engine.
- OBU Header Parsing: The decoder reads the OBU header to determine the OBU type.
- Routing: When the OBU type is identified as
OBU_SEQ_HDR(value1),libdav1droutes the payload to the internal sequence header parsing function, typically handled withinsrc/parse.cvia thedav1d_parse_seq_hdrfunction.
Step 2: Bitstream Reading with Dav1dBitContext
To extract the tightly packed binary data from the stream,
libdav1d utilizes a highly optimized bit reader abstraction
called Dav1dBitContext. This context tracks the current
byte pointer, bit buffer, and bits left in the stream. The parser uses
functions like get_bits and get_symbol to read
variable-length and fixed-length integers representing the syntax
elements defined by the Alliance for Open Media (AOM) AV1
specification.
Step 3: Parsing Key Syntax Elements
The parsing process inside dav1d_parse_seq_hdr follows
the exact state machine of the AV1 specification, extracting parameters
in a strict sequence:
- Profile and Level: It first reads
seq_profileandseq_level_idxto determine if the hardware or software context supports the stream’s complexity. - Frame Dimensions: The decoder parses the maximum
frame width and height (
max_frame_width_minus_1andmax_frame_height_minus_1). These values dictate the maximum memory allocation size for frame buffers. - Color Configuration: The parser calls a helper
function (equivalent to the spec’s
color_config()) to extract color depth (8, 10, or 12-bit), color primaries, transfer characteristics, matrix coefficients, and chroma sample position. - Coding Tool Enables: It reads flags that enable or
disable specific coding features globally, such as
enable_filter_intra,enable_intra_edge_filter,enable_interintra,enable_masked_compound,enable_warped_motion, andenable_order_hint. This allowslibdav1dto bypass complex logic branches during frame decoding if these tools are not used in the sequence.
Step 4: State Initialization and Memory Allocation
Once the sequence header data is fully parsed, libdav1d
stores the structured metadata in a internal
Dav1dSequenceHeader structure.
This structure is then bound to the decoder’s global context
(Dav1dContext). If this is the first sequence header
parsed, or if critical parameters (like resolution, bit depth, or chroma
format) have changed from a previous sequence header,
libdav1d triggers a context re-allocation. It initializes
the thread pool, sets up picture buffer pools, and prepares reference
frame slots based on the newly acquired sequence limits. This ensures
that when the subsequent Frame OBUs arrive, all necessary memory
pipelines are fully allocated and ready for parallel decoding.