Project

General

Profile

Feature #1234

Avoid sender-side copy for large contiguous messages. API for charm and converse layers

Added by Vipul Harsh almost 3 years ago. Updated 10 months ago.

Status:
Merged
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
06/22/2016
Due date:
% Done:

86%

Spent time:
(Total: 15.00 h)

Subtasks

Feature #1111: Avoid sender-side copy in AMPI for large contiguous messagesMergedSam White

Feature #1235: Onesided sender side implementation for PAMI layerMergedVipul Harsh

Feature #1237: Onesided sender side implementation for GNI layerMergedNitin Bhat

Feature #1238: Onesided sender side implementation for Verbs layerMergedJaemin Choi

Feature #1458: Zero-copy send support for the MPI machine layerMergedNitin Bhat

Feature #1459: Zero-copy send support for the netlrts machine layerNewNitin Bhat

Feature #1481: RDMA zero copy send implementation for multicore buildsMergedSam White


Related issues

Related to Charm++ - Feature #1236: Avoid receiver-side copy for large contiguous messages. API for charm and converse layers Merged 09/28/2016
Related to Charm++ - Feature #68: LRTS support for setting up a message to send and transmitting a GET handle Closed 03/01/2013

History

#1 Updated by Vipul Harsh almost 3 years ago

  • Tracker changed from Bug to Feature

#2 Updated by Phil Miller over 2 years ago

  • Target version set to 6.8.0-beta1

#3 Updated by Phil Miller over 2 years ago

  • Status changed from Implemented to In Progress

Still waiting on the SMP fix

#4 Updated by Phil Miller over 2 years ago

From minutes of the last core meeting:

Callback fix to invoke on sending PE and not comm thread implemented.

Another crash to fix.

GNI implementation needs testing against revised infrastructure.

GNI still has alignment issue - RDMA GET needs 4 byte alignment, RDMA PUT does not. Nitin intends to see full solution implemented quickly.

Putting this here so status is clearly visible and easily located.

#5 Updated by Jim Phillips over 2 years ago

The initial implementation of this doesn't need to be based on RDMA. The problem is not that the data is copied, but that additional memory equal to the full data size is used. Flow-controlled message streaming would allow the data to be transferred without blowing up memory usage.

#6 Updated by Jim Phillips over 2 years ago

Assuming pup methods are available for the data, it should be possible to run the pup in a Cthread with the data going into a flow-controlled message stream.

#7 Updated by Jim Phillips over 2 years ago

It would be good to have an MPI implementation as well. Should be easy, since MPI 1.0 supported sending data in place.

#8 Updated by Phil Miller over 2 years ago

  • Tags set to api lrts machine-layers rdma

API patch merged:
https://charm.cs.illinois.edu/gerrit/1273

As expected, no breakage in autobuild resulted.

#9 Updated by Phil Miller over 2 years ago

  • Status changed from In Progress to Merged

API and some machine layer implementations merged. Other machine layers are still pending, but we want to check the box for the beta release.

Also available in: Atom PDF