Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing
Thesis 2005
Publication Type: PhD Thesis
Repository URL: gengbin-thesis
Parallel machines with an extremely large number of processors (at least tens of thousands processors) are now in operation. For example, the IBM BlueGene/L machine with 128K processors is currently being deployed. It is going to be a significant challenge for application developers to write parallel programs in order to exploit the enormous compute power available and manually scale their applications on such machines. Solving these problems involves finding suitable parallel programming models for such machines and addressing issues like load imbalance. In this thesis, we explore Charm++ programming model and its migratable objects for programming such machines and dynamic load balancing techniques to help parallel applications to easily scale on a large number of processors. We also present a parallel simulator that is capable of predicting parallel performance to help analysis and tuning of the parallel performance and facilitate the development of new load balancing techniques, even before such machines are built.

We evaluate the idea of virtualization and its usefulness in helping a programmer to write applications with high degree of parallelism. We demonstrate it by developing several mini-applications with million-way parallelism. We show that Charm++ and AMPI (an extension to MPI) with migratable objects and support for load balancing are suitable programming model for programming very large machines.

It is important to understand the performance of parallel applications on very large parallel machines. This thesis explores Parallel Discrete Event Simulation (PDES) techniques with an optimistic synchronization protocol to simulate parallel applications running on a very large number of processors. We optimize the synchronization protocol by exploiting the inherent determinacy that is normally found in parallel applications to reduce the synchronization overhead significantly.

Load balance problems present significant challenges to applications to achieve scalability on very large machines. We study load balancing techniques and develop a spectrum of load balancing strategies based on studies of the characteristics of applications. These load balancing strategies are motivated by applications such as LeanMD, NAMD (both are classical molecular dynamics applications) and Fractography3D (a dynamic 3D crack propagation simulation program). We optimize our load balancing strategies in multiple dimensions of criteria such as load balancing for improving communication locality, sub-step load balancing, and computation phase-aware load balancing.

We further study the performance of existing load balancing strategies in the context of very large parallel machines using the parallel simulator we developed. We demonstrate the weaknesses of the centralized and fully distributed load balancing schemes via large scale simulation, and design a new scalable hierarchical load balancing scheme suitable for such large machines. This hierarchial load balancing scheme builds load data from instrumenting an application at run-time on both computation and communication pattern in a fully automatic way. The hierarchical load balancing takes application communication pattern into account explicitly in decision making. It also incorporates an explicit memory cost control function to make it easy to adapt to extremely large number of processors with small memory footprint.

Gengbin Zheng, "Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing", Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.
Research Areas