Parallel Languages/Paradigms: Automatic Communication Optimizations

Communication Optimization Framework

The Communication Framework optimizes communication operations in Charm++. The communication cost of a parallel application can greatly affect its scalability. Although communication bandwidth increases have kept pace with increases in processor speed over the past decade, the communication latencies (including the software overhead) for each message have not decreased proportionately.

The framework currently has three major motivations
  • Optimize collective communication operations like AlltoAll personalized communication, AlltoAll multicast, AllReduce etc. Collective communication operations often involve most processors in a system. They are also time consuming and can involve massive data movement. These operations can be optimized by using message combining for small messages and smart message sequencing for large messages. Message combining is achieved by imposing a virtual topology on the processors and routing messages along that topology. Messages destined to a group of processors into one message. The combined message is then sent to a representative processor which forwards the message to the correct destination. For example if the virtual topology is Hypercube, dimensional exchange can be used to combine messages. There will be log(p) stages and in stage i each processor will exchange messages with its ith dimension neighbor. We have also implemented two other virtual topologies 2D Mesh and 3D Grid. For large messages smart message sequencing like prefix send can be used to reduce network contention.
  • Optimize implementations of the Charm++ machine layers to exploit the special features provided by the lower lever API's.
  • Develop a learning framework which will learn the communication patterns of an application and use known strategies to optimize those patterns.
CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm [P2S2 2009]
Architecture for supporting Hardware Collectives in Output-Queued High-Radix Routers [HiPC 2005]
| Sameer Kumar | Laxmikant Kale | Craig Stunkel
Opportunities and Challenges of Modern Communication Architectures:Case Study with QsNet [CAC Workshop at IPDPS 2003]
Scaling Collective Multicast on Fat-tree Networks [ICPADS 2003]
Scaling Collective Multicast on High Performance Clusters [PPL Technical Report 2003]
A Framework for Collective Personalized Communication [IPDPS 2003]
[MS Thesis]
Communication Library for Parallel Architectures [Thesis 1999]
[MS Thesis]
Process Group and Collective Communication in a Message Driven Environment [Thesis 1994]