Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2014
Publication Type: Talk
Repository URL:
Download:
[PDF]
Summary
We present an approach to improving data locality across different
phases of fork/join programs scheduled using work stealing. The
approach consists of: (1) user-specified and automated approaches to
constructing a {\em steal tree}, the schedule of steal operations,
and (2) constrained work-stealing algorithms that constrain
the actions of the scheduler to mirror a given steal tree. These
are combined to construct work-stealing schedules that maximize data
locality across computation phases while ensuring load balance
within each phase. These algorithms are also used to demonstrate
dynamic coarsening , an optimization to improve spatial
locality and sequential overheads by combining many finer-grained
tasks into coarser tasks while ensuring sufficient concurrency for
locality-optimized load balance. Implementation and evaluation in
Cilk demonstrate performance improvements of up to 2.5x on 80 cores.
We also demonstrate that dynamic coarsening can combine the
performance benefits of coarse task specification with the
adaptability of finer tasks.
People
Research Areas