How libdav1d Uses State Machines for AV1 Parsing

This article explores how libdav1d, the popular open-source AV1 video decoder, employs highly optimized state machines to parse complex AV1 bitstreams. We will examine how these state machines manage syntax element decoding, handle Context-Adaptive Binary Arithmetic Coding (CABAC) transitions, and enable efficient multi-threaded video processing.

The Role of State Machines in AV1 Parsing

AV1 bitstreams are structured hierarchically, consisting of sequence headers, frame headers, tile groups, and blocks. To parse these elements sequentially without losing context, libdav1d uses finite state machines (FSMs). These state machines keep track of the current decoding context—such as the block size, prediction mode, and transform type—ensuring that the decoder correctly interprets subsequent bits based on the exact state of the previously decoded syntax elements.

Managing CABAC Probability States

The core of AV1’s high compression efficiency relies on Context-Adaptive Binary Arithmetic Coding (CABAC). In libdav1d, the entropy decoder operates as a specialized, high-performance state machine.

As bits (or “bins”) are read from the stream, the probability models (contexts) for different syntax elements must be dynamically updated. The decoder maintains an internal state representing the probability of a zero or a one for each symbol type. After decoding a bin, the state machine transitions to a new probability state. libdav1d optimizes this process by using lookup tables and specialized assembly instructions to transition between these probability states with minimal CPU overhead.

Parallelism and State Machine Isolation

One of libdav1d’s defining features is its exceptional multi-threaded performance. To parse bitstreams concurrently without race conditions, the parser state machines are strictly isolated.

AV1 allows frames to be divided into independent processing units called “Tiles.” libdav1d leverages this structure by instantiating separate state machines for individual tiles. This decoupling allows multiple CPU threads to parse and decode different portions of the bitstream simultaneously. Because the state of one tile’s entropy coder does not depend on another, threads can run in parallel without the need for constant synchronization.

Instruction-Level Efficiency

By formalizing the parsing logic into structured state machines, the developers of libdav1d have written code that compiler optimizers can easily translate into highly efficient machine instructions.

The predictable transitions of these state machines reduce CPU branch mispredictions. Furthermore, critical sections of the state machine transitions—particularly the CABAC engine and symbol decoding loops—are handwritten in assembly language (such as x86 AVX2/AVX-512 and ARM Neon). This level of optimization ensures that libdav1d remains one of the fastest software AV1 decoders available.