News

Our refinement relation paper was accepted at MEMOCODE 2019!
Our symbolic execution paper was accepted at SpISA 2019!
Our Chum paper was accepted at CPP 2019!

Overview

The Chum project is an exploration of formalizing the semantics of the x86-64 instruction set architecture (ISA) and using those semantics to describe the behavior of an x86-64 binary in a theorem prover. This involves formalization of the semantics of individual x86-64 instructions (“small step semantics”) in a theorem prover, which yields a formal model of the x86-64 ISA. Additionally, it involves lifting the semantics of an x86-64 binary program into the prover, which enables reasoning over that binary’s properties as guided by the machine model.

Visual depiction of the methodology to lift abstract specifications of x86-64 binaries

Past efforts have extracted small-step semantics for the x86-64 ISA into a non-formal representation, such as Heule et al.’s Strata semantics that were machine-learned on live x86-64 hardware. The Chum project lifts those Strata semantics into the Isabelle/HOL theorem prover through a syntactical translation via an intermediate language called Chum, the project’s namesake. Chum is a bitvector language with formal semantics, allowing formal reasoning over it.

To lift an x86-64 binary into a prover, the binary is disassembled to Chum, which is followed by reassembly (for symbolization) and a subsequent syntactic translation into Isabelle/HOL.

Big-step semantics

Verification of binary programs requires reasoning at a higher level of abstraction than at the individual instruction level. Chum explores this at the block level: sequences of instructions that are devoid of loops but include conditionals (“big step semantics”). Those big step semantics are extracted out of an x86-64 binary through formal symbolic execution, which is guided by a set of rewrite rules that are proven correct within Isabelle/HOL. This automatically yields a formal equivalence between decompiled x86-64 binary code and the generated big step semantics.

Trusted computing base

Chum’s TCB is entirely restricted to machine model extraction and formalization: that is, the Strata-to-Chum and Chum-to-Isabelle/HOL parsers, disassemblers, and reassemblers. The TCB is verified by testing, aided in part by the Chum language’s formal semantics.

Papers

x86 Instruction Semantics and Basic Block Symbolic Execution. Freek Verbeek, Abhijith Bharadwaj, Joshua A. Bockenek, Ian Roessle, Timmy Weerwag, and Binoy Ravindran, Archive of Formal Proofs, October 2021, Formal proof development, ISSN: 2150-914x.
Establishing a Refinement Relation between Binaries and Abstract Code, Freek Verbeek, Joshua A. Bockenek, Abhijith Bharadwaj, Ian Roessle, and Binoy Ravindran, ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE 2019), October 9-11, 2019, San Diego, USA.
Symbolic Execution of x86 Assembly in Isabelle/HOL, Freek Verbeek, Abhijith Bharadwaj, Joshua A. Bockenek, Ian Roessle, and Binoy Ravindran, Workshop on Instruction Set Architecture Specification (SpISA 2019), September 13, 2019, Portland, Oregon, USA.
Formally Verified Big Step Semantics out of x86-64 Binaries, Ian Roessle, Freek Verbeek, and Binoy Ravindran, the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP’19), January 14-15, 2019, Cascais/Lisbon, Portugal.

Code and proofs

The associated code and proofs for the CPP 2019 paper can be found here and are also available in archived form.

Contacts

Ian Roessle, Virginia Tech, iroessle@vt.edu
Freek Verbeek, Virginia Tech, freek@vt.edu
Binoy Ravindran, Virginia Tech, binoy@vt.edu

Chum is an open-source project from the Systems Software Research Group (SSRG) at Virginia Tech. It is supported by the Office of Naval Research (ONR) under grant N00014-17-1-2297 and the Naval Sea Systems Command (NAVSEA)/the Naval Engineering Education Consortium (NEEC) under grant N00174-16-C-0018.