David G. Sullivan
|
David Gerard Sullivan, Ph.D. Thesis, Harvard University, September 2003.
Abstract
Complex software systems typically include a set of parameters that
can be adjusted to improve the system’s performance. System designers
expose these parameters, which are often referred to as knobs, because
they realize that no single configuration of the system can adequately
handle every possible workload. Therefore, users are allowed to tune
the system, reconfiguring it to perform well on a specific
workload. However, manually tuning a software system can be an
extremely difficult task, and it is often necessary to dynamically
retune the system as workload characteristics change over time. As a
result, many systems are run using either the default knob settings or
non-optimal alternate settings, and potential performance improvements
go unrealized. Ideally, the process of software tuning should be
automated, allowing software systems to determine their own optimal
knob settings and to reconfigure themselves as needed in response to
changing conditions.
This thesis demonstrates that probabilistic reasoning and decision-making techniques can be used as the foundation of an effective, automated approach to software tuning. In particular, the thesis presents a methodology for automated software tuning that employs the influence diagram formalism and related learning and inference algorithms, and it uses this methodology to tune four knobs from the Berkeley DB embedded database system. Empirical results show that an influence diagram can effectively generalize from training data for this domain, producing considerable performance improvements on a varied set of workloads and outperforming an alternative approach based on linear regression.
The thesis provides a detailed roadmap for applying the proposed software-tuning methodology to an arbitrary software system. It also proposes novel methods of addressing three challenges associated with using an influence diagram for software tuning: modeling the performance of the software system in a tractable manner, determining effective discretizations of continuous variables in the model, and estimating parameters of the model for which no training data is available. In addition, the thesis discusses the design of workload generators that can produce the necessary training data, and it proposes a technique for ensuring that a workload generator captures the steady-state performance of the system being tuned.
Last updated on February 4, 2023.
Photo by Jackie Ricciardi, BU Photography.