Live Webcast 15th Annual Charm++ Workshop

Accelerating messages by avoiding copies using RDMA in an asynchronous parallel runtime system
Thesis 2017
Publication Type: Paper
Repository URL:
With the advent of Exascale computing, the number and size of messages is expected to increase greatly. One sided communication with the help of Remote Direct Memory Access (RDMA) supported hardware is the natural choice for large messages as it has proven to provide reduced latencies and increased bandwidth for large payloads in High Performance Computing (HPC) networks. Using RDMA technology enables the network to bypass the Operating System and perform data transfers without the involvement of the Central Processing Unit (CPU). In addition to not consuming CPU cycles, using RDMA also benefits from zero copy networking where the data being transferred is not copied between the layers of the network stack. Since memory performance is significantly lesser than the CPU performance, it has been observed that memory intensive operations reduce application performance and increase energy consumption. For this reason, reducing memory pressure by saving the cost of allocation and copy helps in improving application performance significantly. The asynchronous message sending paradigm in Charm++ makes a copy of the payload at the sender side. It also requires copying the data from the message into the user’s data structure at the receiver side. As the payload gets larger, the cost of these allocations and copies also increase proportionally. In this thesis, we show the benefits of avoiding the copies at both the sender and receiver side using RDMA on different applications. We also discuss the design of the zero copy user level Application Programming Interface (API) in Charm++ along with the underlying RDMA implementations for different networks in today’s supercomputers.
Research Areas