The CompCert verified compiler
Commented Coq development
Version 3.16, 2025年09月01日
Introduction
CompCert is a compiler that generates ARM, PowerPC, RISC-V and x86 assembly
code from CompCert C, a large subset of the C programming language.
The particularity of this compiler is that it is written mostly within
the specification language of the Coq proof assistant, and its
correctness --- the fact that the generated assembly code is
semantically equivalent to its source program --- was entirely proved
within the Coq proof assistant.
High-level descriptions of the CompCert compiler and its proof of
correctness can be found in the following papers (in increasing order of technical details):
This Web site gives a commented listing of the underlying Coq
specifications and proofs. Proof scripts are folded by default, but
can be viewed by clicking on "Proof". Some modules (written in italics below) differ between the five target architectures. The
PowerPC versions of these modules are shown below; the AArch64, ARM,
x86 and RISC-V versions can be found in the source distribution.
This development is a work in progress; some parts have
substantially changed since the overview papers above were
written.
The complete sources for CompCert can be downloaded from
the Git repository
or the CompCert Web site.
This document and the CompCert sources are copyright Institut
National de Recherche en Informatique et en Automatique (INRIA) and
AbsInt Angewandte Informatik GmbH, and are distributed under the terms of the
following license.
Table of contents
General-purpose libraries, data structures and algorithms
- Coqlib: addendum to the Coq standard library.
- Maps: finite maps.
- Integers: machine integers.
- Floats: machine floating-point numbers.
- Iteration: various forms of "while" loops.
- Ordered: construction of
ordered types.
- Lattice: construction of
semi-lattices.
- Kildall: resolution of dataflow
inequations by fixpoint iteration.
- UnionFind: a persistent union-find data structure.
- Postorder: postorder numbering of a directed graph.
Definitions and theorems used in many parts of the development
- Errors: the Error monad.
- AST: identifiers, whole programs and other
common elements of abstract syntaxes.
- Linking: generic framework to define syntactic linking over the CompCert languages.
- Values: run-time values.
- Events: observable events and traces.
- Memory: memory model.
See also: Memdata (in-memory representation of data).
- Globalenvs: global execution environments.
- Smallstep: tools for small-step semantics.
- Behaviors: from small-step semantics to observable behaviors of programs.
- Determinism: determinism properties of small-step semantics.
- Op: operators, addressing modes and their
semantics.
- Builtins: semantics of built-in functions.
See also: Builtins0 (target-independent part), Builtins1 (target-dependent part).
- Unityping: a solver for atomic unification constraints.
Source, intermediate and target languages: syntax and semantics
- The CompCert C source language:
syntax and
semantics and
determinized semantics and
type system.
See also: type expressions and
operators (syntax and semantics).
See also: reference interpreter.
- Clight: a simpler version of CompCert C where expressions contain no side-effects.
- Csharpminor: low-level
structured language.
- Cminor: low-level structured
language, with explicit stack allocation of certain local variables.
- CminorSel: like Cminor,
with machine-specific operators and addressing modes.
- RTL: register transfer language (3-address
code, control-flow graph, infinitely many pseudo-registers).
See also: Registers (representation of
pseudo-registers).
- LTL: location transfer language (3-address
code, control-flow graph of basic blocks, finitely many physical registers, infinitely
many stack slots).
See also: Locations (representation of
locations) and Machregs (description of processor registers).
- Linear: like LTL, but the CFG is
replaced by a linear list of instructions with explicit branches and labels.
- Mach: like Linear, with a more concrete
view of the activation record.
- Asm: abstract syntax for PowerPC assembly
code.
Compiler passes
Pass
Source & target
Compiler code
Correctness proof
Simplification of control structures;
explication of type-dependent computations
Clight to Csharpminor
Cshmgen
Cshmgenproof
Stack allocation of local variables
whose address is taken;
simplification of switch statements
Csharpminor to Cminor
Cminorgen
Cminorgenproof
All together
- Compiler: composing the passes together;
whole-compiler semantic preservation theorems.
- Complements: interesting consequences of the semantic preservation theorems.
Static analyses
The following static analyses are performed over the RTL intermediate
representation to support optimizations such as constant propagation,
CSE, and dead code elimination.
- Liveness: liveness analysis.
- ValueAnalysis: value and alias analysis
See also: ValueDomain: the abstract domain for value analysis.
See also: ValueAOp: processor-dependent parts of value analysis.
- Deadcode: neededness analysis
See also: NeedDomain: the abstract domain for neededness analysis.
See also: NeedOp: processor-dependent parts of neededness analysis.
Type systems
The
type system of CompCert C is fully formalized. For some intermediate languages of the back-end, simpler type systems are used to statically capture well-formedness conditions.
xavier.leroy@college-de-france.fr