Optimizing Communication for Charm++ Applications by Reducing Network Contention
Concurrency and Computation: Practice and Experience 2010
Publication Type: Paper
Repository URL: 200909_LeanCPCCJ
Abstract
Optimal network performance is critical to efficient parallel
scaling for communication-bound applications on large machines.
With wormhole routing, no-load latencies do not increase
significantly with number of hops traveled. Yet, we, and others
have recently shown that in presence of contention, message
latencies can grow substantially large. Hence task mapping
strategies should take the topology of the machine into account on
large machines. In this paper, we present topology aware mapping as
a technique to optimize communication on 3-dimensional mesh
interconnects and hence improve performance.
Our methodology is facilitated by the idea of object-based decomposition used in Charm++ which separates the processes of decomposition from mapping of computation to processors and allows a more flexible mapping based on communication patterns between objects. Exploiting this and the topology of the allocated job partition, we present mapping strategies for a production code, OpenAtom to improve overall performance and scaling. OpenAtom presents complex communication scenarios of interaction involving multiple groups of objects and makes the mapping task a challenge. Results are presented for OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.
Our methodology is facilitated by the idea of object-based decomposition used in Charm++ which separates the processes of decomposition from mapping of computation to processors and allows a more flexible mapping based on communication patterns between objects. Exploiting this and the topology of the allocated job partition, we present mapping strategies for a production code, OpenAtom to improve overall performance and scaling. OpenAtom presents complex communication scenarios of interaction involving multiple groups of objects and makes the mapping task a challenge. Results are presented for OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.
TextRef
Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale, "Optimizing communication for Charm++ applications by reducing network contention", Concurrency and Computation: Practice and Experience, Volume 23, Issue 2, pages 211–222, February 2011
People
Research Areas