Bug #1084: Eliminate extra copies in AMPI reduce/gather(v) receive paths
Use built-in reducer types for basic types/operations in AMPI
There is an extra copy in AMPI reductions on the sender side: we have to copy the contributed data into a CkReductionMsg prefixed with an AmpiOpHeader struct, which contains the MPI_Op user function to perform the reduction. For predefined types like MPI_INT, MPI_LONG, MPI_FLOAT, and MPI_DOUBLE and basic operations like MPI_SUM, MPI_PROD, MPI_MAX, and MPI_MIN at least, we can eliminate this copy by using Charm++'s built-in reduction types (avoiding the need for an AmpiOpHeader). To do this, we would need to set some flag in the request that signals whether or not the result msg will contain an AmpiOpHeader.
We could also potentially expand the set of built-in reduction types in Charm for types supported in MPI but not in Charm.
#1 Updated by Sam White almost 3 years ago
A couple notes:
1. CkReductionMsg::buildNew will still copy the data, but if an Rdma contribution is added, then we can avoid the copy with this refactoring.
2. We still need to serialize the contribution data into the CkReductionMsg if it is not contiguous.
3. There is no need to store an extra field in the RednReq, since CkReductionMsg's have a public method getReducer() that we can use to compare against AmpiOpHeader.
We could add reducers to DDT, so you could ask a ddt type for a CkReduction::reducerType of a certain operation, like ddt->getType(type)->getReducer(CkDDT_MAX_REDUCER) or something...
#3 Updated by Sam White almost 3 years ago
- Status changed from In Progress to Implemented