Disassembly Soundness Validation
The DSV project is an investigation of how to automatically verify the correctness of disassemblers, especially in contexts where source codes are not available.
Trustworthiness of disassembled code is central to many reverse engineering and related subdisciplines such as decompilation, binary (malware) analysis, and binary rewriting. In these subdisciplines, binary code is first disassembled to assembly code (e.g., x86-64, ARM64), which is usually followed by steps such as control flow reconstruction, type inference and analysis, variable analysis, pointer analysis, and basic block (or function) reconstruction, among others. Ensuring the trustworthiness of disassembled code essentially allows disassemblers to be excluded from the trusted computing base of these subdisciplines.
Verifying the correctness of disassemblers requires a notion of soundness. For such a verification technique to be applicable to a broad class of disassemblers, the soundness notion must be independent of how disassemblers internally lift assembly code from a binary (e.g., recursive traversal, linear sweep). In settings where only binary is available (e.g., legacy, proprietary code), often there is no ground truth as to what consitutes a “correct” assembly instruction.
The project is investigating a soundness concept using a transition relation over concrete machine states: a binary is sound if, for all addresses in the binary that can be reached from the binary’s entry point, the bytes of the (disassembled) instruction located at an address are the same as the actual bytes read from the binary. Thus, a disassembler is unsound if there is a reachable (disassembled) instruction whose bytes are not the same as those in the binary.
Validating soundness requires implementation of low-level functions such as that for retrieving a byte sequence from an address and for mapping an instruction to the corresponding byte sequence representation. Existing compiler subsystems and utilities can be leveraged for this purpose (e.g., gcc/clang’s assembler, readelf).
Validation also requires implementation of the transition relation over machine states, i.e., a transition relation that, given an instruction pointer representing the current machine state, produces the valid set of next instruction addresses. This is an undecidable problem (e.g., distinguishing instructions from data) and involves computing jump targets of indirect branch jumps – a challenge for state-of-the-art disassemblers. To address this, the project is investigating an inexact transition relation that leverages over-approximation: prevent false positives, and allow, but minimize, false negatives. A false positive is the existence of an (incorrectly disassembled) reachable instruction but deemed unreachable. A false negative is the existence of an (incorrectly disassembled) unreachable instruction but deemed reachable. The project is exploring the implementation of such an inexact transition relation using techniques such as bounded model checking.
DSV is an open-source project from the Systems Software Research Group (SSRG) at Virginia Tech. It is based on work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112090028 and contract N6600121C4028, and the US Office of Naval Research (ONR) under grants N00014-17-1-2297 and N00014-18-1-2665.