A zebra with sunglasses and laser eyes spelling Z.


FMCAD 2022 Formal Methods in Computer-Aided Design (FMCAD) 2022

ECOOP 2022 European Conference on Object-Oriented Programming (ECOOP) 2022


Verified compilers such as CompCert and CakeML have become increasingly realistic over the last few years, but their support for floating-point arithmetic has thus far been limited. In particular, they lack the “fast-math-style” optimizations that unverified mainstream compilers perform. Supporting such optimizations in the setting of verified compilers is challenging because these optimizations, for the most part, do not preserve the IEEE-754 floating-point semantics. However, IEEE-754 floating-point numbers are finite approximations of the real numbers, and we argue that any compiler correctness result for fast-math optimizations should appeal to a real-valued semantics rather than the rigid IEEE-754 floating-point numbers.

This paper presents RealCake, an extension of CakeML that achieves end-to-end correctness results for fast-math-style optimized compilation of floating-point arithmetic. This result is achieved by giving CakeML a flexible floating-point semantics and integrating an external proof-producing accuracy analysis. RealCake’s end-to-end theorems relate the I/O behavior of the original source program under real-number semantics to the observable I/O behavior of the compiler generated and fast-math-optimized machine code.

TOG 2022 ACM Transactions on Graphics (TOG) 2022


Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when developing fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring the design and fabrication plan spaces for each design is intractable using current techniques. We present a new approach to jointly optimize design and fabrication plans for carpentered objects. To make this bi-level optimization tractable, we adapt recent work from program synthesis based on equality graphs (e-graphs), which encode sets of equivalent programs. Our insight is that subproblems within our bi-level problem share significant substructures. By representing both designs and fabrication plans in a new bag of parts (BOP) e-graph, we amortize the cost of optimizing design components shared among multiple candidates. Even using BOP e-graphs, the optimization space grows quickly in practice. Hence, we also show how a feedback-guided search strategy dubbed Iterative Contraction and Expansion on E-graphs (ICEE) can keep the size of the e-graph manageable and direct the search towards promising candidates. We illustrate the advantages of our pipeline through examples from the carpentry domain.

POPL 2022 Principles of Programming Languages (POPL) 2022


We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph “top down”, our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized query plan. We also establish the first data complexity result for e-matching, bounding run time as a function of the e-graph size and output size. We prototyped and evaluated our technique in the state-of-the-art egg e-graph framework. Compared to a conventional baseline, relational e-matching is simpler to implement and orders of magnitude faster in practice.

CORRECTNESS 2021 Software Correctness for HPC Applications (CORRECTNESS) 2021

Guarding Numerics Amidst Rising Heterogeneity


New heterogeneous computing platforms-especially GPUs and other accelerators-are being introduced at a brisk pace, motivated by the goals of exploiting parallelism and reducing data movement. Unfortunately, their sheer variety as well as the optimization options supported by them have been observed to alter the computed numerical results to the extent that reproducible results are no longer possible to obtain without extra effort. Our main contribution in this paper is to document the scope and magnitude of this problem which we classify under the heading of numerics. We propose a taxonomy to classify specific problems to be addressed by the community, a few immediately actionable topics as the next steps, and also forums within which to continue discussions.

OOPSLA 2021 Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2021

★ Distinguished Paper

Rewrite Rule Inference Using Equality Saturation


Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Developing rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging. This paper explores how equality saturation, a promising technique that uses e-graphs to apply rewrite rules, can also be used to infer rewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets. We prototyped these strategies in a tool dubbed Ruler. Compared to a similar tool built on CVC4, Ruler synthesizes 5.8x smaller rulesets 25x faster without compromising on proving power. In an end-to-end case study, we show Ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool.

SFF 2021 Solid Freeform Fabrication Symposium (SFF) 2021

A Roadmap Towards Parallel Printing for Desktop 3D Printers


3D printers with multiple extruders (or multi-headed printers) are common in the desktop fabrication community, but are primarily used for multi-color or multi-material printing, using only one extruder at a time. What if these multi-headed desktop printers could also be used for simultaneous parallel printing? While this is a relatively unexplored direction, we argue that it deserves further investigation: a flexible, robust, and affordable parallel printing ecosystem could significantly reduce fabrication time for many applications and further enhance the value of desktop rapid prototyping.

We propose a research agenda to explore the development of a parallel printing pipeline, and summarize our observations from a preliminary investigation of simultaneous extrusion. We hope this vision will encourage and guide future research in developing hardware, firmware, and slicers to facilitate parallel 3D printing.

ARITH 2021 IEEE International Symposium on Computer Arithmetic (ARITH) 2021

Combining Precision Tuning and Rewriting


Precision tuning and rewriting can improve both the accuracy and speed of floating point expressions, yet these techniques are typically applied separately. This paper explores how finer-grained interleaving of precision tuning and rewriting can help automatically generate a richer set of Pareto-optimal accuracy versus speed trade-offs.

We introduce Pherbie (Pareto Herbie), a tool providing both precision tuning and rewriting, and evaluate interleaving these two strategies at different granularities. Our results demonstrate that finer-grained interleavings improve both the Pareto curve of candidate implementations and overall optimization time. On a popular set of tests from the FPBench suite, Pherbie finds both implementations that are significantly more accurate for a given cost and significantly faster for a given accuracy bound compared to baselines using precision tuning and rewriting alone or in sequence.

MAPS 2021 Symposium on Machine Programming (MAPS) 2021

Pure Tensor Program Rewriting via Access Patterns


Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and mapping to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be pure but high-level, making low-level rewrites to hardware targets inexpressible, or low-level but impure, hampering the use of term rewriting altogether.

This paper introduces Glenside, a pure IR whose core abstraction — the access pattern — enables low-level, layout-aware, hardware-centric program rewrites. We demonstrate how term rewriting in Glenside can be used to map program fragments to hardware accelerator invocations and automatically discover classic data layout transformations like im2col. Glenside establishes a new foundation for exploring further term rewriting techniques in optimizing low-level tensor programs.

ICLR 2021 International Conference on Learning Representations (ICLR) 2021

★ Spotlight Paper

Dynamic Tensor Rematerialization


Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an N-layer linear feedforward network on an Ω(√ N) memory budget with only O(N) tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors.

MLSys 2021 Conference on Machine Learning and Systems (MLSys) 2021

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference


Modern deep neural networks increasingly make use of features such as control flow, dynamic data structures, and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes—assumptions that are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms existing solutions for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.

LATTE 2021 Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE) 2021

From DSLs to Accelerator-Rich Platform Implementations: Addressing the Mapping Gap


In deep learning (DL), the hardware parallelism of accelerators like the GPU and the TPU has been used to efficiently implement tensor kernels, and further specialized devices continue to be developed to support this domain. However, while DL frameworks like TensorFlow and TVM have built-in support for specific accelerators like the TPU, adding support for a new custom device requires bespoke compiler extensions, which demand great effort and expertise in both the device and the compilation stack.

We propose to address these challenges by using the Instruction Level Abstraction (ILA) to model both the semantics of hardware accelerators and compiler IR intrinsics. In particular, our goal is to enable the classic approach to building portable compilers, but extended to verifiable lowering to custom accelerators.

The proposed 3LA flow provides for application mapping from the TVM Relay IR to heterogeneous hardware including accelerators, such as the FlexNLP custom accelerator, using the ILA interface. 3LA is a model flow designed to be generalized across a broader range of domains.

POPL 2021 Principles of Programming Languages (POPL) 2021

★ Distinguished Paper

egg: Fast and Extensible Equality Saturation


An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites.

This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation’s distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation.

We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg’s performance and flexibility enable state-of-the-art results across diverse domains.

PLATEAU 2020 Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU) 2020

The Essence of Program Semantics Visualizers: A Three-Axis Model


A program semantics visualizer (PSV) helps illuminate a language’s semantics by explaining the runtime execution of programs. PSVs are often used in introductory programming (CS1) courses to help introduce a notional machine, an abstraction of the computer that executes the language. But what information should PSVs present to fully explain such notional machines?

In this paper we propose a three-axis model to assess the design of PSVs that visualize execution traces. PSVs should help users by clearly answering three questions: What is the machine’s configuration at each execution step? Why did an execution step take place? How did an execution step change the machine’s configuration? We demonstrate our model’s utility for assessing PSVs by explaining why, in actual classroom use, instructors have resorted to manually extending Python Tutor’s visualizations.

NSV 2020 Numerical Software Verification (NSV) 2020

Towards Numerical Assistants


The last few years have seen an explosion of work on tools that address numerical error in scientific, mathematical, and engineering software. The resulting tools can provide essential guidance to expert non-experts: scientists, mathematicians, and engineers for whom mathematical computation is essential but who may have little formal training in numerical methods. It is now time for these tools to move into practice.

Practitioners need a “numerical workbench” that not only succeeds as a research artifact but as a daily tool. We describe our experience adapting Herbie, a tool for numerical error repair, from a research prototype to a reliable workhorse for daily use. In particular, we focus on how we worked to increase user trust and use internal measurement to polish the tool. Looking more broadly, we show that community development and an investment in the generality of our tools, such as through the FPBench project, will better support users and strengthen our research community.

PLDI 2020 Programming Language Design and Implementation (PLDI) 2020

Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations


Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure.

This paper proposes a second decompilation stage to robustly “shrink” unstructured CSG expressions into more editable programs with map and fold operators. We present Szalinski, a tool that uses Equality Saturation with semantics-preserving CAD rewrites to efficiently search for smaller equivalent programs. Szalinski relies on inverse transformations, a novel way for solvers to speculatively add equivalences to an E-graph. We qualitatively evaluate Szalinski in case studies, show how it composes with an existing mesh decompiler, and demonstrate that Szalinski can shrink large models in seconds.

SIGA 2019 SIGGRAPH Asia, ACM Transactions on Graphics (SIGA) 2019

Carpentry Compiler


Traditional manufacturing workflows strongly decouple design and fabrication phases. As a result, fabrication-related objectives such as manufacturing time and precision are difficult to optimize in the design space, and vice versa. This paper presents HL-HELM, a high-level, domain-specific language for expressing abstract, parametric fabrication plans; it also introduces LL-HELM, a low-level language for expressing concrete fabrication plans that take into account the physical constraints of available manufacturing processes. We present a new compiler that supports the real-time, unoptimized translation of high-level, geometric fabrication operations into concrete, tool-specific fabrication instructions; this gives users immediate feedback on the physical feasibility of plans as they design them. HELM offers novel optimizations to improve accuracy and reduce fabrication time as well as material costs. Finally, optimized low-level plans can be interpreted as step-by-step instructions for users to actually fabricate a physical product. We provide a variety of example fabrication plans in the carpentry domain that are designed using our high-level language, show how the compiler translates and optimizes these plans to generate concrete low-level instructions, and present the final physical products fabricated in wood.

CORRECTNESS 2019 Software Correctness for HPC Applications (CORRECTNESS) 2019

Toward Multi-Precision, Multi-Format Numerics


Recent research has provided new, domain-specific number systems that accelerate modern workloads. Using these number systems effectively requires analyzing subtle multi-precision, multi-format (MPMF) code. Ideally, recent programming tools that automate numerical analysis tasks could help make MPMF programs both accurate and fast. However, three key challenges must be addressed: existing automated tools are difficult to compose due to subtle incompatibilities; there is no “gold standard” for correct MPMF execution; and no methodology exists for generalizing existing, IEEE-754-specialized tools to support MPMF. In this paper we report on recent work towards mitigating these related challenges. First, we extend the FPBench standard to support multi-precision, multi-format (MPMF) applications. Second, we present Titanic, a tool which provides reference results for arbitrary MPMF computations. Third, we describe our experience adapting an existing numerical tool to support MPMF programs.

OOPSLA 2019 Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2019

Modular Verification of Web Page Layout


Automated verification can ensure that a web page satisfies accessibility, usability, and design properties regardless of the end user’s device, preferences, and assistive technologies. However, state-of-the-art verification tools for layout properties do not scale to large pages because they rely on whole-page analyses and must reason about the entire page using the complex semantics of the browser layout algorithm.

This paper introduces and formalizes modular layout proofs. A modular layout proof splits a monolithic verification problem into smaller verification problems, one for each component of a web page. Each component specification can use rely/guarantee-style preconditions to make it verifiable independently of the rest of the page and enabling reuse across multiple pages. Modular layout proofs scale verification to pages an order of magnitude larger than those supported by previous approaches.

We prototyped these techniques in a new proof assistant, Troika. In Troika, a proof author partitions a page into components and writes specifications for them. Troika then verifies the specifications, and uses those specifications to verify whole-page properties. Troika also enables the proof author to verify different component specifications with different verification tools, leveraging the strengths of each. In a case study, we use Troika to verify a large web page and demonstrate a speed-up of 13–1469x over existing tools, taking verification time from hours to seconds. We develop a systematic approach to writing Troika proofs and demonstrate it on 8 proofs of properties from prior work to show that modular layout proofs are short, easy to write, and provide benefits over existing tools.

SPLASH-E 2019 ACM SIGPLAN International Symposium on SPLASH-E (SPLASH-E) 2019

Theia: Automatically Generating Correct Program State Visualizations


Program state visualizations (PSVs) help programmers understand hidden program state like objects, references, and closures. Unfortunately, existing PSV tools do not support custom language semantics, which educators often use to introduce programming languages gradually. They also fail to visualize key pieces of program state, which can lead to incorrect and confusing visualizations.

Theia, a generic PSV framework, uses formal abstract machine definitions to produce complete, continuous, and consistent (CCC) PSVs. To produce CCC visualizations with Theia, an educator only needs to specify an abstract machine and optionally customize the resulting web page, allowing her to visualize custom language semantics without developing a language-specific tool.

FTPL 2019 Foundations and Trends in Programming Languages (FTPL) 2019

QED at Large: A Survey of Engineering of Formally Verified Software


Development of formal proofs of correctness of programs can increase actual and perceived reliability and facilitate better understanding of program specifications and their underlying assumptions. Tools supporting such development have been available for over 40 years, but have only recently seen wide practical use. Projects based on construction of machine-checked formal proofs are now reaching an unprecedented scale, comparable to large software projects, which leads to new challenges in proof development and maintenance. Despite its increasing importance, the field of proof engineering is seldom considered in its own right; related theories, techniques, and tools span many fields and venues. This survey of the literature presents a holistic understanding of proof engineering for program correctness, covering impact in practice, foundations, proof automation, proof organization, and practical proof development.

CAV 2019 Computer-Aided Verification (CAV) 2019

Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler


Verified compilers like CompCert and CakeML offer increasingly sophisticated optimizations. However, their deterministic source semantics and strict IEEE 754 compliance prevent the verification of “fast-math” style floating-point optimizations. Developers often selectively use these optimizations in mainstream compilers like GCC and LLVM to improve the performance of computations over noisy inputs or for heuristics by allowing the compiler to perform intuitive but IEEE 754-unsound rewrites.

We designed, formalized, implemented, and verified a compiler for Icing, a new language which supports selectively applying fast-math style optimizations in a verified compiler. Icing’s semantics provides the first formalization of fast-math in a verified compiler. We show how the Icing compiler can be connected to the existing verified CakeML compiler and verify the end-to-end translation by a sequence of refinement proofs from Icing to the translated CakeML. We evaluated Icing by incorporating several of GCC’s fast-math rewrites. While Icing targets CakeML’s source language, the techniques we developed are general and could also be incorporated in lower-level intermediate representations.

EuroSys 2019 European Conference on Computer Systems (EuroSys) 2019

Teaching Rigorous Distributed Systems With Efficient Model Checking


Writing correct distributed systems code is difficult, especially for novice programmers. The inherent asynchrony and need for fault-tolerance make errors almost inevitable. Industrial-strength testing and model checking have been shown to be effective at uncovering bugs, but they come at a cost — in both time and effort — that is far beyond what students can afford. To address this, we have developed an efficient model checking framework and visual debugger for distributed systems, with the goal of helping students find and fix bugs in near real-time. We identify two novel techniques for reducing the search state space to more efficiently find bugs in student implementations. We report our experiences using these tools to help over two hundred students build a correct, linearizable, fault-tolerant, dynamically-sharded key–value store.

CoNGA 2019 Conference for Next Generation Arithmetic (CoNGA) 2019

Sinking Point: Dynamic Precision Tracking for Floating-Point


We present sinking-point, a floating-point-like number system that tracks precision dynamically though computations. With existing floating-point number systems, such as the venerable IEEE 754 standard, numerical results do not inherently contain any information about their precision or accuracy; to determine if a result is numerically accurate, a separate analysis must be performed. By contrast, sinking-point records the precision of each intermediate value and result computed, so highly imprecise results can be identified immediately. Compared to IEEE 754 floating-point, sinking-point’s representation requires only a few additional bits of storage, and computations require only a few additional bitwise operations. Sinking-point is fully generalizable, and can be extended to provide dynamic error tracking for nearly any digital number system, including posits.

ICFP 2018 International Conference on Functional Programming (ICFP) 2018

Functional Programming for Compiling and Decompiling Computer-aided Design


Desktop-manufacturing techniques like 3D printing are increasingly popular because they reduce the cost and complexity of producing customized objects on demand. Unfortunately, the vibrant communities of early adopters, often referred to as “makers,” are not well-served by currently available software pipelines. Users today must compose idiosyncratic sequences of tools which are typically repurposed variants of proprietary software originally designed for expert specialists. This paper proposes fundamental programming-languages techniques to bring improved rigor, reduced complexity, and new functionality to the computer-aided design (CAD) software pipeline for applications like 3D-printing. Compositionality, denotational semantics, compiler correctness, and program synthesis all play key roles in our approach, starting from the perspective that solid geometry is a programming language.

Specifically, we define a purely functional language for CAD called LambdaCAD and a polygon surface-mesh intermediate representation. We then define denotational semantics of both languages to 3D solids and a compiler from CAD to mesh accompanied by a proof of semantics preservation. We illustrate the utility of this foundation by developing a novel synthesis algorithm based on evaluation contexts to “reverse compile” difficult-to-edit meshes downloaded from online maker communities back to more-editable CAD programs. All our prototypes have been implemented in OCaml to enable further exploration of functional programming for desktop manufacturing.

FM 2018 Formal Methods (FM) 2018

Combining Tools for Optimization and Analysis of Floating-Point Computations


Recent renewed interest in optimizing and analyzing floating-point programs has lead to a diverse array of new tools for numerical programs. These tools are often complementary, each focusing on a distinct aspect of numerical programming. Building reliable floating point applications typically requires addressing several of these aspects, which makes easy composition essential. This paper describes the composition of two recent floating-point tools: Herbie, which performs accuracy optimization, and Daisy, which performs accuracy verification. We find that the combination provides numerous benefits to users, such as being able to use Daisy to check whether Herbie’s unsound optimizations improved the worst-case roundoff error, as well as benefits to tool authors, including uncovering a number of bugs in both tools. The combination also allowed us to compare the different program rewriting techniques implemented by these tools for the first time. The paper lays out a road map for combining other floating-point tools and for surmounting common challenges.


Check out the FPBench, Daisy, and Herbie projects this paper is building on!

ITP 2018 Interactive Theorem Proving (ITP) 2018

Software Verification with ITPs Should Use Binary Code Extraction to Reduce the TCB


LCF-style provers emphasise that all results are secured by logical inference, and yet their current facilities for code extraction or code generation fall short of this high standard. This paper argues that extraction mechanisms with a small trusted computing base (TCB) ought to be used instead, pointing out that the recent CakeML and Oeuf projects show that this is possible in HOL and within reach in Coq.


Check out the CakeML and Oeuf projects this paper is based on!

MAPL 2018 Machine Learning and Programming Languages (MAPL) 2018

Relay: a New IR for Machine Learning Frameworks


Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon’s deep learning framework MxNet.

PLDI 2018 Programming Language Design and Implementation (PLDI) 2018

Finding Root Causes of Floating Point Error


Floating-point arithmetic plays a central role in science, engineering, and finance by enabling developers to approximate real arithmetic. To address numerical issues in large floating-point applications, developers must identify root causes, which is difficult because floating-point errors are generally non-local, non-compositional, and non-uniform.

This paper presents Herbgrind, a tool to help developers identify and address root causes in numerical code written in low-level languages like C/C++ and Fortran. Herbgrind dynamically tracks dependencies between operations and program outputs to avoid false positives and abstracts erroneous computations to simplified program fragments whose improvement can reduce output error. We perform several case studies applying Herbgrind to large, expert-crafted numerical programs and show that it scales to applications spanning hundreds of thousands of lines, correctly handling the low-level details of modern floating point hardware and mathematical libraries and tracking error across function boundaries and through the heap.

PLDI 2018 Programming Language Design and Implementation (PLDI) 2018

Verifying that Web Pages have Accessible Layout


Usability and accessibility guidelines aim to make graphical user interfaces accessible to all users, by, say, requiring that text is sufficiently large, interactive controls are visible, and heading size corresponds to importance. These guidelines must hold on the infinitely many possible renderings of a web page generated by differing screen sizes, fonts, and other user preferences. Today, these guidelines are tested by manual inspection of a few renderings, because 1) the guidelines are not expressed in a formal language, 2) the semantics of browser rendering are not well understood, and 3) no tools exist to check all possible renderings of a web page.

VizAssert solves these problems. First, it introduces visual logic to precisely specify accessibility properties. Second, it formalizes a large fragment of the browser rendering algorithm using novel finitization reductions. Third, it provides a sound, automated tool for verifying assertions in visual logic.

We encoded 14 assertions drawn from best-practice accessibility and mobile-usability guidelines in visual logic. VizAssert checked them on on 62 professionally designed web pages. It found 64 distinct errors in the web pages, while reporting only 13 false positive warnings.

POPL 2018 Principles of Programming Languages (POPL) 2018

Programming and Proving with Distributed Protocols


Distributed systems play a crucial role in modern infrastructure, but are notoriously difficult to implement correctly. This difficulty arises from two main challenges: (a) correctly implementing core system components (e.g., two-phase commit), so all their internal invariants hold, and (b) correctly composing standalone system components into functioning trustworthy applications (e.g., persistent storage built on top of a two-phase commit instance). Recent work has developed several approaches for addressing (a) by means of mechanically verifying implementations of core distributed components, but no methodology exists to address (b) by composing such verified components into larger verified applications. As a result, expensive verification efforts for key system components are not easily reusable, which hinders further verification efforts.

In this paper, we present Disel, the first framework for implementation and compositional verification of distributed systems and their clients, all within the mechanized, foundational context of the Coq proof assistant. In Disel, users implement distributed systems using a domain specific language shallowly embedded in Coq and providing both high-level programming constructs as well as low-level communication primitives. Components of composite systems are specified in Disel as protocols, which capture system-specific logic and disentangle system definitions from implementation details. By virtue of Disel’s dependent type system, well-typed implementations always satisfy their protocols’ invariants and never go wrong, allowing users to verify system implementations interactively using Disel’s Hoare-style program logic, which extends state-of-the-art techniques for concurrency verification to the distributed setting. By virtue of the substitution principle and frame rule provided by Disel’s logic, system components can be composed leading to modular, reusable verified distributed systems.

We describe Disel, illustrate its use with a series of examples, outline its logic and metatheory, and report on our experience using it as a framework for implementing, specifying, and verifying distributed systems.

CPP 2018 Certified Programs and Proofs (CPP) 2018

Œuf: Minimizing the Coq Extraction TCB


Verifying systems by implementing them in the programming language of a proof assistant (e.g., Gallina for Coq) lets us directly leverage the full power of the proof assistant for verifying the system. But, to execute such an implementation requires extraction, a large complicated process that is in the trusted computing base (TCB).

This paper presents Oeuf, a verified compiler from a subset of Gallina to assembly. Oeuf’s correctness theorem ensures that compilation preserves the semantics of the source Gallina program. We describe how Oeuf’s specification can be used as a foreign function interface to reason about the interaction between compiled Gallina programs and surrounding shim code. Additionally, Oeuf maintains a small TCB for its front-end by reflecting Gallina programs to Oeuf source and automatically ensuring equivalence using computational denotation. This design enabled us to implement some early compiler passes (e.g., lambda lifting) in the untrusted reflection and ensure their correctness via translation validation. To evaluate Oeuf, we compile Appel’s SHA256 specification from Gallina to x86 and write a shim for the generated code, yielding a verified sha256sum implementation with a small TCB.

ICALEPCS 2017 International Conference on Accelerator and Large Experimental Control Systems (ICALEPCS) 2017

Automatic Formal Verification for EPICS


We built an EPICS-based radiation therapy machine control program and are using it to treat patients at our hospital. To help ensure safety, the control program uses a restricted subset of EPICS constructs and programming techniques, and we developed several new automated formal verification tools for this subset.

To check our control program, we built a Symbolic Interpreter that finds errors in EPICS database programs, using symbolic execution and satisfiability checking. It found serious errors in our control program that were missed by reviews and testing.

To check the EPICS runtime (EPICS Core) itself, we first developed a Formal Semantics for EPICS database programs, based on the EPICS Record Reference Manual (RRM) and expressed in the specification language of an automated theorem prover. We built a formally-verified Trace Validator and used it to check the EPICS runtime against our semantics by differential testing with millions of randomly generated programs. The testing process generally corroborated that the EPICS runtime conforms to its specification in the RRM, but it did find several omissions and ambiguities in the RRM that might mislead users. Our formal semantics for EPICS enables valuable future developments: a full proof of correctness for our EPICS program, verified analyses for arbitrary EPICS programs, and a Verified Compiler that could compile an EPICS database to a verified standalone program, while dispensing with much of the unverified EPICS toolchain and runtime.

ICFP 2017 International Conference on Functional Programming (ICFP) 2017

SpaceSearch: A Library for Building and Verifying Solver-aided Tools


Many verification tools build on automated solvers. These tools reduce problems in a specific application domain (e.g., compiler optimization validation) to queries that can be discharged with a highly optimized solver. But the correctness of the reductions themselves is rarely verified in practice, limiting the confidence that the solver’s output establishes the desired domain-level property.

This paper presents SpaceSearch, a new library for developing solver-aided tools within a proof assistant. A user builds their solver-aided tool in Coq against the SpaceSearch interface, and the user then verifies that the results provided by the interface are sufficient to establish the tool’s desired high-level properties. Once verified, the tool can be extracted to an implementation in a solver-aided language (e.g., Rosette), where SpaceSearch provides an efficient instantiation of the SpaceSearch interface with calls to an underlying SMT solver. This combines the strong correctness guarantees of developing a tool in a proof assistant with the high performance of modern SMT solvers. This paper also introduces new optimizations for such verified solver-aided tools, including parallelization and incrementalization.

We evaluate SpaceSearch by building and verifying two solver-aided tools. The first, SaltShaker, checks that RockSalt’s x86 semantics for a given instruction agrees with STOKE’s x86 semantics. SaltShaker identified 7 bugs in RockSalt and 1 bug in STOKE. After these systems were patched by their developers, SaltShaker verified the semantics’ agreement on 15,255 instruction instantiations in under 2h. The second tool, BGProof, is a verified version of an existing Border Gateway Protocol (BGP) router configuration checker. Like the existing checker, BGProof scales to checking industrial configurations spanning over 240 KLOC, identifying 19 configuration inconsistencies with no false positives. However, the correctness of BGProof has been formally proven, and we found 2 bugs in the unverified implementation. These results demonstrate that SpaceSearch is a practical approach to developing efficient, verified solver-aided tools.

SNAPL 2017 Summit oN Advances in Programming Languages (SNAPL) 2017

Programming Language Abstractions for Modularly Verified Distributed Systems


Distributed systems are rarely developed as monolithic programs. Instead, like any software, these systems may consist of multiple program components, which are then compiled separately and linked together. Modern systems also incorporate various services interacting with each other and with client applications. However, state-of-the-art verification tools focus predominantly on verifying standalone, closed-world protocols or systems, thus failing to account for the compositional nature of distributed systems. For example, standalone verification has the drawback that when protocols and their optimized implementations evolve, one must re-verify the entire system from scratch, instead of leveraging compositionality to contain the reverification effort.

In this paper, we focus on the challenge of modular verification of distributed systems with respect to high-level protocol invariants as well as for low-level implementation safety properties. We argue that the missing link between the two is a programming paradigm that would allow one to reason about both high-level distributed protocols and low-level implementation primitives in a single verification-friendly framework. Such a link would make it possible to reap the benefits from both the vast body of research in distributed computing, focused on modular protocol decomposition and consistency properties, as well as from the recent advances in program verification, enabling construction of provably correct systems implementations. To showcase the modular verification challenges, we present some typical scenarios of decomposition between a distributed protocol and its implementations. We then describe our ongoing research agenda, in which we are attempting to address the outlined problems by providing a typing discipline and a set of domain-specific primitives for specifying, implementing and verifying distributed systems. Our approach, mechanized within a proof assistant, provides the means of decomposition necessary for modular proofs about distributed protocols and systems.

SNAPL 2017 Summit oN Advances in Programming Languages (SNAPL) 2017

Programming Language Tools and Techniques for 3D Printing


We propose a research agenda to investigate programming language techniques for improving affordable, end-user desktop manufacturing processes such as 3D printing. Our goal is to adapt programming languages tools and extend the decades of research in industrial, high-end CAD/CAM in order to help make affordable desktop manufacturing processes more accurate, fast, reliable, and accessible to end-users. We focus on three major areas where 3D printing can benefit from programming language tools: design synthesis, optimizing compilation, and runtime monitoring. We present preliminary results on synthesizing editable CAD models from difficult-to-edit surface meshes, discuss potential new compilation strategies, and propose runtime monitoring techniques. We conclude by discussing additional near-future directions we intend to pursue.

CoqPL 2017 Workshop on Coq for Programming Languages (CoqPL) 2017

Verification of Implementations of Distributed Systems Under Churn


In order to provide high availability and operate in unreliable environments, many critical applications are implemented as distributed systems. Unfortunately, the need to handle packet drops, machine crashes, and churn—the spontaneous arrival and departure of nodes to and from the network—has made these systems difficult to implement correctly in practice. Recent work provides mechanisms to formally verify implementations that tolerate packet drops and machine crashes; however, no extant verification framework supports reasoning about churn. To address this challenge, we introduce support for reasoning about churn to the Coq-based Verdi framework.

OOPSLA 2016 Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2016

Scalable Verification of Border Gateway Protocol Configurations with an SMT Solver


Internet Service Providers (ISPs) use the Border Gateway Protocol (BGP) to announce and exchange routes for de- livering packets through the internet. ISPs must carefully configure their BGP routers to ensure traffic is routed reli- ably and securely. Correctly configuring BGP routers has proven challenging in practice, and misconfiguration has led to worldwide outages and traffic hijacks. This paper presents Bagpipe, a system that enables ISPs to declaratively express BGP policies and that automatically verifies that router configurations implement such policies. The novel initial network reduction soundly reduces policy verification to a search for counterexamples in a finite space. An SMT-based symbolic execution engine performs this search efficiently. Bagpipe reduces the size of its search space using predicate abstraction and parallelizes its search using symbolic variable hoisting. Bagpipe’s policy specification language is expressive: we expressed policies inferred from real AS configurations, policies from the literature, and policies for 10 Juniper TechLibrary configuration scenarios. Bagpipe is efficient: we ran it on three ASes with a total of over 240,000 lines of Cisco and Juniper BGP configuration. Bagpipe is effective: it revealed 19 policy violations without issuing any false positives.

NetPL 2016 ACM SIGCOMM Workshop on Networking and Programming Languages (NetPL) 2016

Formal Semantics and Automated Verification for the Border Gateway Protocol


Traffic is routed across the Internet by Autonomous Systems, or ASes, such as ISPs, corporations, and universities. To route traffic reliably and securely, ASes must configure their Border Gateway Protocol (BGP) routers to implement policies restricting how routing announcements can be used and exchanged with other ASes.

It is challenging to correctly implement BGP policies in lowlevel configuration languages. Large ASes maintain millions of lines of frequently-changing configurations that run distributed across hundreds of routers. Router misconfigurations are common and have led to highly visible failures affecting ASes and their billions of users. For example, in 2009 YouTube was inaccessible worldwide for several hours due to a misconfiguration in Pakistan, and in 2010 and 2014 China Telecom hijacked significant but unknown fractions of international traffic for extended periods. Goldberg surveys several additional major outages and their causes.

We present the first mechanized formal semantics of BGP based on the BGP specification RFC 4271, and we show how to use this semantics to develop reliable tools and guidelines that help BGP administrators avoid router misconfiguration. In contrast to previous semantics, our semantics is fully formal (it is implemented in the Coq proof assistant), and it models all required features of the BGP specification modulo low-level details such as bit representation of update messages and TCP.

To provide evidence for the correctness and usefulness of our semantics: 1) we have built the Bagpipe tool which automatically checks that BGP configurations adhere to given policy specifications, revealing 19 apparent errors in three ASes with over 240,000 lines of BGP configuration; 2) we have tested the BGP simulator C-BGP, revealing one bug; and 3) we are currently extending and formalizing the pen-and-paper proof by Gao & Rexford on the convergence of BGP, revealing necessary extensions to Gao & Rexford’s original configuration guidelines.

CAV 2016 Computer-Aided Verification (CAV) 2016

Investigating Safety of a Radiotherapy Machine Using System Models with Pluggable Checkers


Formal techniques for guaranteeing software correctness have made tremendous progress in recent decades. However, applying these techniques to real-world safety-critical systems remains challenging in practice. Inspired by goals set out in prior work, we report on a large-scale case study that applies modern verification techniques to check safety properties of a radiotherapy system in current clinical use. Because of the diversity and complexity of the system’s components (software, hardware, and physical), no single tool was suitable for both checking critical component properties and ensuring that their composition implies critical system properties. This paper describes how we used state-of-the-art approaches to develop specialized tools for verifying safety properties of individual components, as well as an extensible tool for composing those properties to check the safety of the system as a whole. We describe the key design decisions that diverged from previous approaches and that enabled us to practically apply our approach to provide machine-checked guarantees. Our case study uncovered subtle safety-critical flaws in a pre-release of the latest version of the radiotherapy system’s control software.

NSV 2016 Numerical Software Verification (NSV) 2016

Toward a Standard Benchmark Format and Suite for Floating-Point Analysis


We introduce FPBench, a standard benchmark format for validation and optimization of numerical accuracy in floating-point computations. FPBench is a first step toward addressing an increasing need in our community for comparisons and combinations of tools from different application domains. To this end, FPBench provides a basic floating-point benchmark format and accuracy measures for comparing different tools. The FPBench format and measures allow comparing and composing different floating-point tools. We describe the FPBench format and measures and show that FPBench expresses benchmarks from recent papers in the literature, by building an initial benchmark suite drawn from these papers. We intend for FPBench to grow into a standard benchmark suite for the members of the floating-point tools research community.

PLDI 2016 Programming Language Design and Implementation (PLDI) 2016

Verified Peephole Optimizations for CompCert


Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be eliminated by formally verifying the compiler, but state-of-the- art formally verified compilers like CompCert do not sup- port assembly-level program transformations. This paper presents Peek, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert. Peek contributes four new components: a lower level semantics for CompCert x86 syntax, a liveness analysis, a library for expressing and verifying peephole optimizations, and a verified peephole optimization pass built into CompCert. Each of these is accompanied by a correctness proof in Coq against realistic assumptions about the calling convention and the system memory allocator. Verifying peephole optimizations in Peek requires proving only a set of local properties, which we have proved are sufficient to ensure global transformation correctness. We have proven these local properties for 28 peephole transformations from the literature. We discuss the development of our new assembly semantics, liveness analysis, representation of program transformations, and execution engine; describe the verification challenges of each component; and detail techniques we applied to mitigate the proof burden.

CPP 2016 Certified Programs and Proofs (CPP) 2016

Planning for Change in a Formal Verification of the Raft Consensus Protocol


We present the first formal verification of state machine safety for the Raft consensus protocol, a critical component of many distributed systems. We connected our proof to previous work to establish an end-to-end guarantee that our implementation provides linearizable state machine replication. This proof required iteratively discovering and proving 90 system invariants. Our verified implementation is extracted to OCaml and runs on real networks.

The primary challenge we faced during the verification process was proof maintenance, since proving one invariant often required strengthening and updating other parts of our proof. To address this challenge, we propose a methodology of planning for change during verification. Our methodology adapts classical information hiding techniques to the context of proof assistants, factors out common invariant-strengthening patterns into custom induction principles, proves higher-order lemmas that show any property proved about a particular component implies analogous properties about related components, and makes proofs robust to change using structural tactics. We also discuss how our methodology may be applied to systems verification more broadly.


Also check out the Proof Engineering project!

PLDI 2015 Programming Language Design and Implementation (PLDI) 2015

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems


Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi’s utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.

PLDI 2015 Programming Language Design and Implementation (PLDI) 2015

★ Distinguished Paper

Automatically Improving Accuracy for Floating Point Expressions


Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic. We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie’s heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.

ICRA 2015 International Conference on Robotics and Automation (ICRA) 2015

RoboFlow: A Flow-based Visual Programming Language for Mobile Manipulation Tasks


General-purpose robots can perform a range of useful tasks in human environments; however, programming them to robustly function in all possible environments that they might encounter is unfeasible. Instead, our research aims to develop robots that can be programmed by its end-users in their context of use, so that the robot needs to robustly function in only one particular environment. This requires intuitive ways in which end-users can program their robot. To that end, this paper contributes a flow-based visual programming language, called RoboFlow, that allows programming of generalizable mobile manipulation tasks. RoboFlow is designed to (i) ensure a robust low-level implementation of program procedures on a mobile manipulator, and (ii) restrict the high-level programming as much as possible to avoid user errors while enabling expressive programs that involve branching, looping, and nesting. We present an implementation of RoboFlow on a PR2 mobile manipulator and demonstrate the generalizability and error handling properties of RoboFlow programs on everyday mobile manipulation tasks in human environments.


Check out the interactive RoboFlow Web GUI!

SNAPL 2015 Summit oN Advances in Programming Languages (SNAPL) 2015

Toward a Dependability Case Language and Workflow for a Radiation Therapy System


We present a near-future research agenda for bringing a suite of modern programming-languages verification tools – specifically interactive theorem proving, solver-aided languages, and formally defined domain-specific languages – to the development of a specific safety-critical system, a radiotherapy medical device. We sketch how we believe recent programming-languages research advances can merge with existing best practices for safety-critical systems to increase system assurance and developer productivity. We motivate hypotheses central to our agenda: That we should start with a single specific system and that we need to integrate a variety of complementary verification and synthesis tools into system development.

HRI 2015 Human Robot Interaction (HRI) 2015

Visual Robot Programming for Generalizable Mobile Manipulation Tasks


General-purpose robots present the opportunity to be programmed for a specific purpose {em after} deployment. This requires tools for end-users to quickly and intuitively program robots to perform useful tasks in new environments. In this paper, we present a flow-based visual programming language (VPL) for mobile manipulation tasks, demonstrate the generalizability of tasks programmed in this VPL, and present a preliminary user study of a development tool for this VPL.


Check out the interactive RoboFlow Web GUI and the full ICRA paper on this project for more!

CoqPL 2015 Workshop on Coq for Programming Languages (CoqPL) 2015

Peek: A Formally Verified Peephole Optimization Framework for x86


Peek is a first step toward adding support for assembly-level program analyses, transformations, and optimizations in CompCert. Currently, Peek focuses on x86 peephole transformations implemented and verified in Coq. Peek is designed to provide a modular interface requiring that each peephole optimization satisfy only local correctness properties. Our primary result establishes that, assuming the C calling convention, any peephole optimization satisfying these local properties preserves global program meaning.


See the Peek PLDI 2016 paper for more!

OSDI 2014 Operating Systems Design and Implementation (OSDI) 2014

Jitk: A Trustworthy In-Kernel Interpreter Infrastructure


Modern operating systems run multiple interpreters in the kernel, which enable user-space applications to add new functionality or specialize system policies. The correctness of such interpreters is critical to the overall system security: bugs in interpreters could allow adversaries to compromise user-space applications and even the kernel.

Jitk is a new infrastructure for building in-kernel interpreters that guarantee functional correctness as they compile user-space policies down to native instructions for execution in the kernel. To demonstrate Jitk, we implement two interpreters in the Linux kernel, BPF and INET-DIAG, which are used for network and system call filtering and socket monitoring, respectively. To help application developers write correct filters, we introduce a high-level rule language, along with a proof that Jitk correctly translates high-level rules all the way to native machine code, and demonstrate that this language can be integrated into OpenSSH with tens of lines of code. We built a prototype of Jitk on top of the CompCert verified compiler and integrated it into the Linux kernel. Experimental results show that Jitk is practical, fast, and trustworthy.

PLDI 2014 Programming Language Design and Implementation (PLDI) 2014

Automating Formal Proofs for Reactive Systems


Implementing systems in proof assistants like Coq and proving their correctness in full formal detail has consistently demonstrated promise for making extremely strong guarantees about critical software, ranging from compilers and operating systems to databases and web browsers. Unfortunately, these verifications demand such heroic manual proof effort, even for a single system, that the approach has not been widely adopted.We demonstrate a technique to eliminate the manual proof burden for verifying many properties within an entire class of applications, in our case reactive systems, while only expending effort comparable to the manual verification of a single system. A crucial insight of our approach is simultaneously designing both (1) a domain-specific language (DSL) for expressing reactive systems and their correctness properties and (2) proof automation which exploits the constrained language of both programs and properties to enable fully automatic, pushbutton verification. We apply this insight in a deeply embedded Coq DSL, dubbed Reflex, and illustrate Reflex’s expressiveness by implementing and automatically verifying realistic systems including a modern web browser, an SSH server, and a web server. Using Reflex radically reduced the proof burden: in previous, similar versions of our benchmarks written in Coq by experts, proofs accounted for over 80% of the code base; our versions require no manual proofs.

NDSS 2014 Network and Distributed System Security (NDSS) 2014

SafeDispatch: Securing C++ Virtual Calls from Memory Corruption Attacks


Several defenses have increased the cost of traditional, low-level attacks that corrupt control data, e.g. return addresses saved on the stack, to compromise program execution. In response, creative adversaries have begun circumventing these defenses by exploiting programming errors to manipulate pointers to virtual tables, or vtables, of C++ objects. These attacks can hijack program control flow whenever a virtual method of a corrupted object is called, potentially allowing the attacker to gain complete control of the underlying system. In this paper we present SAFEDISPATCH, a novel defense to prevent such vtable hijacking by statically analyzing C++ programs and inserting sufficient runtime checks to ensure that control flow at virtual method call sites cannot be arbitrarily influenced by an attacker. We implemented SAFEDISPATCH as a Clang++/LLVM extension, used our enhanced compiler to build a vtable-safe version of the Google Chromium browser, and measured the performance overhead of our approach on popular browser benchmark suites. By carefully crafting a handful of optimizations, we were able to reduce average runtime overhead to just 2.1%.

SECURITY 2012 USENIX Security Symposium (SECURITY) 2012

Establishing Browser Security Guarantees through Formal Shim Verification


Web browsers mediate access to valuable private data in domains ranging from health care to banking. Despite this critical role, attackers routinely exploit browser vulnerabilities to exfiltrate private data and take over the underlying system. We present QUARK, a browser whose kernel has been implemented and verified in Coq. We give a specification of our kernel, show that the implementation satisfies the specification, and finally show that the specification implies several security properties, including tab non-interference, cookie integrity and confidentiality, and address bar integrity.

LMCS 2011 Logical Methods in Computer Science (LMCS) 2011

Equality Saturation: A New Approach to Optimization


Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The optimizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program. At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuristic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own. We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.


This is the full journal version of the original EqSat paper.

PLDI 2010 Programming Language Design and Implementation (PLDI) 2010

Bringing Extensibility to Verified Compilers


Verified compilers, such as Leroy’s CompCert, are accompanied by a fully checked correctness proof. Both the compiler and proof are often constructed with an interactive proof assistant. This technique provides a strong, end-to-end correctness guarantee on top of a small trusted computing base. Unfortunately, these compilers are also challenging to extend since each additional transformation must be proven correct in full formal detail.

At the other end of the spectrum, techniques for compiler correctness based on a domain-specific language for writing optimizations, such as Lerner’s Rhodium and Cobalt, make the compiler easy to extend: the correctness of additional transformations can be checked completely automatically. Unfortunately, these systems provide a weaker guarantee since their end-to-end correctness has not been proven fully formally.

We present an approach for compiler correctness that provides the best of both worlds by bridging the gap between compiler verification and compiler extensibility. In particular, we have extended Leroy’s CompCert compiler with an execution engine for optimizations written in a domain specific and proved that this execution engine preserves program semantics, using the Coq proof assistant. We present our CompCert extension, XCert, including the details of its execution engine and proof of correctness in Coq. Furthermore, we report on the important lessons learned for making the proof development manageable.

PLDI 2009 Programming Language Design and Implementation (PLDI) 2009

Proving Optimizations Correct using Parameterized Program Equivalence


Translation validation is a technique for checking that, after an optimization has run, the input and output of the optimization are equivalent. Traditionally, translation validation has been used to prove concrete, fully specified programs equivalent. In this paper we present Parameterized Equivalence Checking (PEC), a generalization of translation validation that can prove the equivalence of parameterized programs. A parameterized program is a partially specified program that can represent multiple concrete programs. For example, a parameterized program may contain a section of code whose only known property is that it does not modify certain variables. By proving parameterized programs equivalent, PEC can prove the correctness of transformation rules that represent complex optimizations once and for all, before they are ever run. We implemented our PEC technique in a tool that can establish the equivalence of two parameterized programs. To highlight the power of PEC, we designed a language for implementing complex optimizations using many-to-many rewrite rules, and used this language to implement a variety of optimizations including software pipelining, loop unrolling, loop unswitching, loop interchange, and loop fusion. Finally, to demonstrate the effectiveness of PEC, we used our PEC implementation to verify that all the optimizations we implemented in our language preserve program behavior.

POPL 2009 Principles of Programming Languages (POPL) 2009

Equality Saturation: A New Approach to Optimization


Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The optimizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program. At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuristic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own. We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.

OOPSLA 2008 Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2008

Deep Typechecking and Refactoring


Large software systems are typically composed of multiple layers, written in different languages and loosely coupled using a string-based interface. For example, in modern web-applications, a server written in Java communicates with a database back-end by passing in query strings. This widely prevalent approach is unsafe as the analyses developed for the individual layers are oblivious to the semantics of the dynamically constructed strings, making it impossible to statically reason about the correctness of the interaction. Further, even simple refactoring in such systems is daunting and error prone as the changes must also be applied to isolated string fragments scattered across the code base.

We present techniques for deep typechecking and refactoring for systems that combine Java code with a database back-end using the Java Persistence API. Deep typechecking ensures that the queries that are constructed dynamically are type safe and that the values returned from the queries are used safely by the program. Deep refactoring builds upon typechecking to allow programmers to safely and automatically propagate code refactorings through the query string fragments.

Our algorithms are implemented in a tool called QUAIL. We present experiments evaluating the effectiveness of QUAIL on several benchmarks ranging from 3,369 to 82,907 lines of code. We show that QUAIL is able to verify that 84% of query strings in our benchmarks are type safe. Finally, we show that QUAIL reduces the number of places in the code that a programmer must look at in order to perform a refactoring by several orders of magnitude.