Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory
PPL Technical Report 2016
Publication Type: Paper
Repository URL: http://charm.cs.illinois.edu/newPapers/16-19/techreport.pdf
Download:
[PDF]
Abstract
The recent trend of rapid increase in the number
of cores per chip has resulted in vast amount of on-node
parallelism. Not only the number of cores per node is increasing
substantially but also the cores are becoming heterogeneous. The
high variability in the performance of the hardware components
introduce imbalance due to heterogeneity. The applications
are also becoming more complex resulting in dynamic load
imbalance. Load imbalance can result in loss of performance
and decrease in system utilization. We address the challenge
of handling both transient and persistent load imbalance while
maintaining locality and incurring low overhead. In this paper,
we propose a new integrated runtime system that combines the
Charm++ distributed programming model with concurrent tasks
to handle the load imbalance problem. It utilizes an infrequent
periodic assignment of work to cores based on load measurement,
in combination with user created tasks to handle load imbalance.
We integrate OpenMP with Charm++ so as to enable creation of
potential tasks via OpenMP’s parallel loop construct. This is not
specific to Charm++ and is also available to MPI applications
as well through Adaptive MPI implementation. We show the
benefit of using this integrated runtime system on three different
applications. We show improvements of 2X on ChaNGa on 128K
cores and more than 3X on NAMD at 2K cores. We also show
the benefit on an MPI application, Kripke, using Adaptive MPI.
People
Research Areas