Arithmetic Coding

DEV Community

and verifies lossless round-trip on the full 1000-symbol sequence.

The Adaptive Variant

The encoder above uses a fixed probability model. Adaptive arithmetic coding generalizes to unknown or changing distributions by updating the cumulative-frequency table after each symbol. Both encoder and decoder apply identical updates in the same order, so they stay synchronized without any transmitted model.

The update is simple: after encoding or decoding symbol $s,ドル increment its frequency count by 1, update the cumulative totals, and rescale if the total exceeds a threshold. This is the semi-adaptive (online) model used in practice.

The structural advantage over adaptive Huffman is real. Adaptive Huffman requires rebalancing a tree after each symbol: $O(\log n)$ with non-trivial bookkeeping. Adaptive arithmetic coding only increments a count and recomputes cumulative sums: $O(\text{alphabet size}),ドル cache-friendly.

Production arithmetic coders in JPEG XL and AV1 use more sophisticated context models. The arithmetic stage itself is unchanged. The probability estimate feeding it becomes a function of recent history. The coder only needs a cumulative-frequency interval; how the model produces $p$ is not its concern.

The Theoretical Endpoint

Shannon's source-coding theorem says that for a memoryless source with distribution $p,ドル no prefix-free code can achieve expected length below $H(p)$ bits per symbol. Arithmetic coding achieves $H(p)$ in the limit. The bound is tight, and arithmetic coding reaches it.

That settles compression for memoryless sources. Sources with memory are a different problem. A Markov source of order $k$ can be handled by conditioning on the last $k$ symbols: one arithmetic coder per context. Generalizing further, context-mixing predictors maintain an ensemble of models and blend their probability estimates. The arithmetic coding stage stays unchanged throughout: it only needs a good $p$.

The best general-purpose lossless compressors in use (PAQ, ZPAQ, and derivatives) are context-mixing predictors driving an arithmetic coder. The predictor is where the innovation happens. The coder is the fixed, information-theoretically optimal back end.

What arithmetic coding cannot do is compress below $H(p)$. That is not an implementation limitation. It is a theorem. The next tool in the series, Succinct Bit Vectors and Rank/Select, shifts from compressing data to indexing it space-efficiently: a different kind of optimality.

Cross-references

Back: Huffman Coding (post 9) showed that integer codeword lengths bound compression at one bit above entropy. Arithmetic coding removes that bound.

Universal Codes as Priors (post 3) introduced entropy and redundancy measurements; the priors::entropy() function used in the convergence tests comes from priors.hpp in that post's directory.

Forward: Succinct Bit Vectors and Rank/Select (post 11) shifts from entropy coding to space-efficient indexing, where the goal is not compression but constant-time rank and select queries on compressed representations.

Cross-series: In the Bits Follow Types framing, arithmetic coding is the entropy-optimal realization of the Either combinator's tag bit. The Either codec tags a choice with 1 bit regardless of probability; arithmetic coding replaces the tag with a fractional contribution proportional to the symbol's true information content.

Footnote: The production implementation lives at include/pfc/arithmetic_coding.hpp in the PFC library. It includes both the integer range coder developed here and a higher-level adaptive variant with configurable context models.