Jan Midtgaard, Norman Ramsey, and Bradford Larsen. Engineering definitional interpreters. PPDP '13.
A definitional interpreter should be clear and easy to write, but it may run 4–10 times slower than a well-crafted bytecode interpreter. In a case study focused on implementation choices, we explore ways of making definitional interpreters faster without expending much programming effort. We implement, in OCaml, interpreters based on three semantics for a simple subset of Lua. We compile the OCaml to x86 native code, and we systematically investigate hundreds of combinations of algorithms and data structures. In this experimental context, our fastest interpreters are based on natural semantics; good algorithms and data structures make them 2–3 times faster than naïve interpreters. Our best interpreter, created using only modest effort, runs only 1.5 times slower than a mature bytecode interpreter implemented in C.
Bradford Larsen. Simple Optimizations for an Applicative Array Language for Graphics Processors. DAMP '11.
Array programming allows high-level use of fast graphics processors for general-purpose computing. This paper shows a few optimizations, such as array fusion and automatic use of GPU shared memory, that can yield order-of-magnitude reductions in constant factors. These powerful optimizations are especially simple to implement given a functional source language.
Bradford Larsen, Ethan Burns, Wheeler Ruml, and Robert C. Holte. Searching Without a Heuristic: Efficient Use of Abstraction. AAAI-10.
The A* algorithm is the grandfather of many heuristic search algorithms that find cheapest paths through a graph. A* must be supplied with an admissible (i.e., non-overestimating) heuristic to find optimal solutions; in many domains such heuristics are hard to invent. This paper presents Switchback, a novel hierarchical heuristic search algorithm (which derives an admissible heuristic from a domain abstraction function as it executes) that avoids inefficiencies of its predecessors, Hierarchical A* and Hierarchical IDA*.
Ronald Veldema, Bradford Larsen, and Michael Philippsen. A DSM Protocol Aware of both Thread Migration and Memory Constraints. PDCS '08.
A distributed shared memory system allows a computer cluster to appear to the programmer as a single multicore computer. Such a system must handle access of remote data, typically through either replicating shared data (which costs memory) or through migrating the relevant data or computation (which can take time). This paper shows how data replication and migration of computation can be effectively combined in a distributed shared memory system designed for out-of-core computation in memory-intensive applications.
This work was awarded Best Paper in the “Distributed Information and Systems” category.