Adaptive Load Balancing for HPC Applications
Thesis 2016
Publication Type: PhD Thesis
One of the critical factors that affect the performance of many applications is load imbalance. Applications are increasingly becoming sophisticated and are using irregular structures and adaptive refinement techniques, resulting in load imbalance. Moreover, systems are becoming more complex. The number of cores per node is increasing substantially and nodes are becoming heterogeneous. High variability in the performance of the hardware components introduces further imbalance. Load imbalance leads to drop in system utilization and degrades the performance. To address the load imbalance problem, many HPC applications employ dynamic load balancing algorithms to redistribute the work and balance the load.

Different application characteristics warrant different load balancing strategies. We need a variety of high-quality, scalable load balancing algorithms to cater to different applications. However, using an appropriate load balancer is insufficient to achieve good performance because performing load balancing incurs a cost. Moreover, due to the dynamic nature of the application, it is hard to decide when to perform load balancing. Therefore, deciding when to load balance and which strategy to use for load balancing may not be possible a priori.

With the ever increasing core counts on a node, there will be a vast amount of on-node parallelism. Due to the massive on-node parallelism, load imbalance occurring at the node level can be mitigated within the node instead of performing a global load balancing. However, having the application developer manage resources and handle dynamic imbalances is inefficient as well as is a burden on the programmer.

The focus of this dissertation is on developing scalable and adaptive techniques for handling load imbalance. The dissertation presents different load balancing algorithms for handling inter and intra-node load imbalance. It also presents an introspective run-time system, which will monitor the application and system characteristics and make load balancing decisions automatically.

