Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications
ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2012
Publication Type: Paper
Repository URL:
Abstract
Applications often involve iterative execution of identical or
slowly evolving calculations. Such applications require incremental
rebalancing to improve load balance across iterations. In this
paper, we consider the design and evaluation of two distinct
approaches to addressing this challenge: persistence-based load
balancing and work stealing. The work to be performed is
over decomposed into tasks, enabling automatic rebalancing by
the middleware. We present a hierarchical persistence-based
rebalancing algorithm that performs localized incremental
rebalancing. We also present an active-message-based
retentive work stealing algorithm optimized for iterative
applications on distributed memory machines. We demonstrate low
overheads and high efficiencies on the full NERSC Hopper (146,400
cores) and ALCF Intrepid systems (163,840 cores), and on up to
128,000 cores on OLCF Titan.
People
Research Areas