Distributed Program Execution on Shared, Heterogeneous Systems

Introduction

The goal of the this project is the effective utilization of existing networked computers as a general purpose computational resource for executing parallel programs. Specifically, we are focusing on synchronization mechanisms and scheduling algorithms that manage distributed execution on networks of workstations that are not dedicated to the problem of interest. Here, the communications resources are those provided either by a local area network (for physically co-located computers) or a wide area network (for geographically remote computers). In addition, the computational resources are shared, typically with a primary user that is allowing spare cycles to be used as long as the distributed computation does not significantly interfere with the primary user's utilization of the machine.

To ensure that our work is relevant to the real world, we have four parallel applications we are using to test our ideas. The first is a simple toy problem designed to simplify the analysis of performance and ease the understanding of how the system operates. The remaining three applications are production codes used by academic research groups, one in transportation network design optimization (work directed by James Campbell at UMSL), a Barnes-Hut solver for the N-Body problem, and discrete-event simulation of VLSI systems (joint work with Bradley Noble at SIUE).

Historical Context

Previous work we have done in this area includes the development of an analytic performance model of synchronous iterative algorithms executing on shared, heterogeneous resources. Publications describing this work include:

Recent Activities

We have recently extended the models described above to incorporate the effects of speculative computation. This work is reported in:

Follow-on Activities

We are currently implementing an experimental testbed to investigate the properties of a whole host of scheduling algorithms and synchronization performance issues. This includes investigations into appropriate figures of merit for parallel application performance in a shared, heterogeneous execution environment. Publications describing this work include:

Finally, we have investigated scheduling algorithms for clusters than enable background cycles to be effectively used without impacting the performance of the primary tasks on the system. This work is reported in:

Last modified 12 August 2006. Return to Roger's home page .

Roger Chamberlain <roger AT wustl.edu>