Periodic Hierarchical Load Balancing for Large Supercomputers
International Journal for High Performance Computing Applications (IJHPCA) 2010
Publication Type: Paper
Repository URL: 201006_HierLdbIJHPCA
Abstract
Large parallel machines with hundreds of thousands of processors
are being built. Ensuring good load balance is critical for scaling
certain classes of parallel applications on even thousands of
processors. Centralized load balancing algorithms suffer from
scalability problems, especially on machines with relatively small
amount of memory. Fully distributed load balancing algorithms, on
the other hand, tend to yield poor load balance on very large
machines. In this paper, we present an automatic dynamic
hierarchical load balancing method that overcomes the scalability
challenges of centralized schemes and poor solutions of traditional
distributed schemes. This is done by creating multiple levels of
load balancing domains which form a tree. This hierarchical method
is demonstrated within a measurement-based load balancing framework
in Charm++. We present techniques to deal with scalability
challenges of load balancing at very large scale. We show
performance data of the hierarchical load balancing method on up to
16,384 cores of Ranger cluster (at TACC) and 65,536 cores of a Blue
Gene/P at Argonne National Laboratory for a synthetic benchmark. We
also demonstrate the successful deployment of the method in a
scientific application, NAMD with results on the Blue Gene/P
machine at ANL.
TextRef
Gengbin Zheng, Abhinav Bhatele, Esteban Meneses and Laxmikant V. Kale, "Periodic Hierarchical Load Balancing for Large Supercomputers", accepted for publication in International Journal for High Performance Computing Applications (IJHPCA), 2010
People
Research Areas