The Chum Project



The Chum project is an exploration of formalizing the semantics of the x86-64 instruction set architecture (ISA) and using those semantics to describe the behavior of an x86-64 binary in a theorem prover. This involves formalization of the semantics of individual x86-64 instructions (“small step semantics”) in a theorem prover, which yields a formal model of the x86-64 ISA. Additionally, it involves lifting the semantics of an x86-64 binary program into the prover, which enables reasoning over that binary’s properties as guided by the machine model.

Visual depiction of the methodology to lift abstract specifications of x86-64 binaries

Past efforts have extracted small-step semantics for the x86-64 ISA into a non-formal representation, such as Heule et al.’s Strata semantics that were machine-learned on live x86-64 hardware. The Chum project lifts those Strata semantics into the Isabelle/HOL theorem prover through a syntactical translation via an intermediate language called Chum, the project’s namesake. Chum is a bitvector language with formal semantics, allowing formal reasoning over it.

To lift an x86-64 binary into a prover, the binary is disassembled to Chum, which is followed by reassembly (for symbolization) and a subsequent syntactic translation into Isabelle/HOL.

Big-step semantics

Verification of binary programs requires reasoning at a higher level of abstraction than at the individual instruction level. Chum explores this at the block level: sequences of instructions that are devoid of loops but include conditionals (“big step semantics”). Those big step semantics are extracted out of an x86-64 binary through formal symbolic execution, which is guided by a set of rewrite rules that are proven correct within Isabelle/HOL. This automatically yields a formal equivalence between decompiled x86-64 binary code and the generated big step semantics.

Trusted computing base

Chum’s TCB is entirely restricted to machine model extraction and formalization: that is, the Strata-to-Chum and Chum-to-Isabelle/HOL parsers, disassemblers, and reassemblers. The TCB is verified by testing, aided in part by the Chum language’s formal semantics.


Code and proofs

The associated code and proofs for the CPP 2019 paper can be found here and are also available in archived form.


Chum is an open-source project from the Systems Software Research Group (SSRG) at Virginia Tech. It is supported by the Office of Naval Research (ONR) under grant N00014-17-1-2297 and the Naval Sea Systems Command (NAVSEA)/the Naval Engineering Education Consortium (NEEC) under grant N00174-16-C-0018.