MESA Mathematical Execution for Safety Assurance of Numerical Applications

Introduction

High‑assurance software development has long relied on static program analysis techniques such as theorem proving, exhaustive model checking, and symbolic execution. These techniques deliver mathematical proofs of correctness, but when applied to floating-point programs—characterized by complex floating-point semantics, non-linear control logic, and hardware variations—they face severe scalability barriers due to state-space explosion and brittle constraint solving, making them impractical at scale. At the other extreme, dynamic program analysis techniques such as random or heuristic search‑based testing and fuzzing scale but lack rigorous soundness (or completeness) guarantees and often fail to explore deep states.

Lens of mathematical optimization

MESA’s key insight is to treat safety and reachability problems as continuous optimization tasks. MESA transforms safety verification and testing into a minimization problem over a representing function that measures how far a given input is from violating a safety property. The representing function is non-negative and vanishes exactly when the input triggers a violation, so minimizing it corresponds to finding errors.

Given a program PROG and a safety property expressed as a relation over runtime values, MESA automatically derives a representing function W(x) whose minimum corresponds exactly to violating inputs. For example, bounds-checking problems become the minimization of a function that captures the distance between the computed index and its bounds, and equality conditions become squared residuals.

Once safety properties are encoded as numerical objective functions, MESA leverages off‑the‑shelf optimization to search the input space. Since the representing function can be evaluated by executing a transformed program and minimized using optimization algorithms (instead of solving symbolic constraints), MESA sidesteps the challenges of state-space explosion and thereby scales. Additionally, MESA unifies rigorous semantics with numerical optimization, provides soundness assurances for a class of optimization functions, and incorporates realistic failure modes.

Overcoming optimization pitfalls

Since the optimization landscape may contain flat regions or discontinuities that may mislead search algorithms—and program branching logic can introduce spurious minima that stall convergence—MESA uses concepts called Monotonic Convergence Condition (MCC) and path–input affinity to ensure that the objective function decreases monotonically as execution follows the desired path and branch satisfaction improves. Per‑path optimization further decomposes discontinuous landscapes into smooth subproblems, eliminating false minima and enabling scalable minimization. MESA enforces MCC by augmenting W(x) with path‑input affinity: a large offset penalizes shallow execution depth and rewards deeper traversal, guaranteeing a strictly decreasing objective as the target path is approached. Per‑path analysis partitions execution into independent subproblems, each admitting a smooth optimization landscape and eliminating spurious local minima.

MESA integrates local (gradient-based) and global (Monte Carlo Markov Chain, MCMC) optimization methods. Local methods exploit smoothness in augmented landscapes to quickly converge to nearby minima, while MCMC sampling escapes local optima and explores multiple execution paths.

Tool features

The MESA toolchain automatically instruments a program at the level of the LLVM compiler’s intermediate representation (LLVM IR) to generate representing functions and path–input affinity metrics. The tool includes a library of property templates for tasks such as bounds checking and range analysis. The toolchain integrates with existing verification pipelines through an API that returns counterexamples, proofs of infeasibility, or coverage metrics.

Prior use cases and demonstrated effectiveness

Case studies conducted with MESA demonstrate that off-the-shelf optimizers can solve large families of numerical verification problems within milliseconds, thereby opening the door to interactive assurance workflows. MESA was demonstrated on the Software Verification Competition (SVCOMP) 2024 bounds-checking benchmarks and showed that it achieves 100% accuracy, running 170 times faster than the state-of-the-art model checker, CBMC. The following figure illustrates these results (MESA tool is represented as AWD).

Accuracy and runtime of MESA/AWD on SV-COMP 2024 benchmarks.

MESA was also shown to obtain near-optimal branch coverage on production math libraries, including the GNU Scientific Library, orders of magnitude faster than Austin, the state-of-the-art floating-point testing tool. MESA’s potential for satisfiability solving was also demonstrated, achieving accurate results with over a 700× performance improvement compared to state-of-the-art SMT solvers, including Microsoft’s Z3.

Limitations

MESA’s current limitations include the following:

History and perspective

Using mathematical optimization for program analysis tasks has a long history; our line of research crystallized as Mathematical Execution (ME)—a term coined by Prof. Zhendong Su—first presented around OOPSLA’15 and published as a technical report. ME treats testing/verification as minimizing a non-negative representing function that hits zero when the target behavior occurs. An array of successor efforts built on that result:

Mathematical execution has parallels to several other lines of research in program analysis and related fields. For instance, stochastic superoptimization formulates efficient machine code generation as a stochastic optimization problem. Optimization methods are also used in systems analysis, such as sum-of-squares programming for stability and reachability analysis in hybrid systems. Furthermore, stochastic search enhances constraint solving via SMT tools and aligns with search-based paradigms in program synthesis for generating correct code from specifications.