Live Webcast 15th Annual Charm++ Workshop

-->
Parallel Languages/Paradigms:
Automatic Communication Optimizations

Communication Optimizations

The communication cost of a parallel application can greatly affect its scalability. Although communication bandwidth increases have kept pace with increases in processor speed over the past decade, the communication latencies (including the software overhead) for each message have not decreased proportionately.

The framework currently has three major motivations
  • Optimize collective communication operations like AlltoAll personalized communication, AlltoAll multicast, AllReduce etc. Collective communication operations often involve most processors in a system. They are also time consuming and can involve massive data movement. These operations can be optimized by using message combining for small messages and smart message sequencing for large messages. Message combining is achieved by imposing a virtual topology on the processors and routing messages along that topology. Messages destined to a group of processors are agglomerated into a single message. The combined message is then sent to a representative processor which forwards the message to the correct destination. For example if the virtual topology is Hypercube, dimensional exchange can be used to combine messages. There will be log(p) stages and in stage i each processor will exchange messages with its ith dimension neighbor. We have also implemented two other virtual topologies 2D Mesh and 3D Grid. For large messages smart message sequencing like prefix send can be used to reduce network contention.
  • Optimize implementations of the Charm++ machine layers to exploit the special features provided by the lower lever API's.
  • Develop a learning framework which will learn the communication patterns of an application and use known strategies to optimize those patterns.
People
Papers/Talks
22-10
2022
[PhD Thesis]
Runtime Techniques for Efficient Execution of Virtualized, Migratable MPI Ranks [Thesis 2022]
22-08
2022
[Paper]
Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints [ExaMPI 2022]
22-05
2022
[Paper]
Improving Scalability with GPU-Aware Asynchronous Tasks [HIPS 2022]
| Jaemin Choi | David Richards | Laxmikant Kale
22-03
2022
[Paper]
Optimizing Non-Commutative Allreduce over Virtualized, Migratable MPI Ranks [APDCM 2022]
18-01
2017
[Talk]
Optimizing Point-to-Point Communication between Adaptive MPI Endpoints in Shared Memory [ExaMPI 2017]
17-10
2017
[Paper]
Optimizing Point-to-Point Communication between Adaptive MPI Endpoints in Shared Memory [ExaMPI 2017]
15-04
2015
[PhD Thesis]
Software Topological Message Aggregation Techniques For Large-scale Parallel Systems [Thesis 2015]
14-18
2014
[Paper]
TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages [ICPP 2014]
14-12
2014
[Paper]
PICS: A Performance-Analysis-Based Introspective Control System to Steer Parallel Applications [ROSS 2014]
08-11
2009
[Paper]
CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm [P2S2 2009]
05-02
2005
[Paper]
Architecture for supporting Hardware Collectives in Output-Queued High-Radix Routers [HiPC 2005]
| Sameer Kumar | Laxmikant Kale | Craig Stunkel
03-15
2003
[Paper]
Opportunities and Challenges of Modern Communication Architectures:Case Study with QsNet [CAC Workshop at IPDPS 2003]
03-11
2003
[Paper]
Scaling Collective Multicast on Fat-tree Networks [ICPADS 2003]
03-04
2003
[Paper]
Scaling Collective Multicast on High Performance Clusters [PPL Technical Report 2003]
02-10
2003
[Paper]
A Framework for Collective Personalized Communication [IPDPS 2003]
99-08
1999
[MS Thesis]
Communication Library for Parallel Architectures [Thesis 1999]
94-07
1994
[MS Thesis]
Process Group and Collective Communication in a Message-Driven Environment [Thesis 1994]