Scaling Collective Multicast on High Performance Clusters
PPL Technical Report 2003
Publication Type: Paper
Repository URL: comlibmulticast
Abstract
Collective communication operations often involve massive data
movement over the entire network. A bad implementation of these
operations can affect the scalability of an application to a large
number of processors. In this paper we study the collective
multicast operation. The extreme case of collective multicast is
all-to-all multicast MPI_Allgather. Here each processor multicasts
a message to all the other processors. We present optimizations and
performance studies for all-to-all multicast. These optimizations
need to be different for small and large messages. For small
messages, the major issue is minimization of software overhead.
This can be achieved by message combining. For large messages it is
network contention that can be reduced by intelligent message
sequencing. Modern NIC's have a communication co-processor that
performs message management through zero copy remote DMA
operations. We present an asynchronous non blocking collective
multicast framework that allows the processor do other computation
while the collective operation is in progress. We will also present
performance comparisons of the various algorithms implemented by
our framework with many relevant applications and benchmarks.
TextRef
L. V. Kale and Sameer Kumar, "Scaling Collective Multicast on High Performance
Clusters", Parallel Programming Laboratory, Department of Computer Science,
University of Illinois at Urbana-Champaign, 2003.
People
Research Areas