Improving Communication Performance in Dense Linear Algebra via Topology Aware Collectives
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2011
Publication Type: Paper
Repository URL:
Download:
[PDF]
Abstract
Recent results have shown that topology aware mapping reduces network contention
in communication-intensive kernels on massively parallel machines.
We demonstrate that on mesh interconnects, topology aware mapping allows
for utilization of highly-efficient topology aware
collectives. We map novel 2.5D dense linear algebra algorithms to
cuboid partitions allocated by a Blue Gene/P supercomputer.
Our mappings allow the algorithms to exploit optimized line multicasts
and reductions.
Commonly used 2D algorithms cannot be mapped in this fashion.
On 65,536 cores of Blue Gene/P, 2.5D algorithms with rectangular collectives
are 2.6x and 2.7x faster for matrix multiply and LU factorization,
respectively. For LU, communication time drops by up to 92%. We derive a
novel performance model based on the LogP model for rectangular broadcasts and
reductions. We model performance on a hypothetical exascale
architecture. Our study
evaluates the benefits of topology aware collectives for high performance
algorithms.
TextRef
Edgar Solomonik, Abhinav Bhatele, James Demmel, Improving communication performance in dense linear algebra via topology aware collectives, International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing) 2011
People
- Edgar Solomonik
- Abhinav Bhatele
- James Demmel
Research Areas