How libdav1d Uses State Machines for AV1 Parsing
This article explores how libdav1d, the popular
open-source AV1 video decoder, employs highly optimized state machines
to parse complex AV1 bitstreams. We will examine how these state
machines manage syntax element decoding, handle Context-Adaptive Binary
Arithmetic Coding (CABAC) transitions, and enable efficient
multi-threaded video processing.
The Role of State Machines in AV1 Parsing
AV1 bitstreams are structured hierarchically, consisting of sequence
headers, frame headers, tile groups, and blocks. To parse these elements
sequentially without losing context, libdav1d uses finite
state machines (FSMs). These state machines keep track of the current
decoding context—such as the block size, prediction mode, and transform
type—ensuring that the decoder correctly interprets subsequent bits
based on the exact state of the previously decoded syntax elements.
Managing CABAC Probability States
The core of AV1’s high compression efficiency relies on
Context-Adaptive Binary Arithmetic Coding (CABAC). In
libdav1d, the entropy decoder operates as a specialized,
high-performance state machine.
As bits (or “bins”) are read from the stream, the probability models
(contexts) for different syntax elements must be dynamically updated.
The decoder maintains an internal state representing the probability of
a zero or a one for each symbol type. After decoding a bin, the state
machine transitions to a new probability state. libdav1d
optimizes this process by using lookup tables and specialized assembly
instructions to transition between these probability states with minimal
CPU overhead.
Parallelism and State Machine Isolation
One of libdav1d’s defining features is its exceptional
multi-threaded performance. To parse bitstreams concurrently without
race conditions, the parser state machines are strictly isolated.
AV1 allows frames to be divided into independent processing units
called “Tiles.” libdav1d leverages this structure by
instantiating separate state machines for individual tiles. This
decoupling allows multiple CPU threads to parse and decode different
portions of the bitstream simultaneously. Because the state of one
tile’s entropy coder does not depend on another, threads can run in
parallel without the need for constant synchronization.
Instruction-Level Efficiency
By formalizing the parsing logic into structured state machines, the
developers of libdav1d have written code that compiler
optimizers can easily translate into highly efficient machine
instructions.
The predictable transitions of these state machines reduce CPU branch
mispredictions. Furthermore, critical sections of the state machine
transitions—particularly the CABAC engine and symbol decoding loops—are
handwritten in assembly language (such as x86 AVX2/AVX-512 and ARM
Neon). This level of optimization ensures that libdav1d
remains one of the fastest software AV1 decoders available.