Project

General

Profile

Bug #1553

Support for sdag entry method with rdma parameter

Added by Nitin Bhat about 2 years ago. Updated about 2 years ago.

Status:
Merged
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
05/10/2017
Due date:
% Done:

100%


Description

I tried adding rdma functionality to the receiveGhosts method in examples/charm++/load_balancing/stencil3d. The closure structure and the unmarshalling code weren't getting generated correctly as it treats the rdma param as a regular pointer instead of a CkRdmaWrapper. SDAG support is something we ignored and didn't test in the generic layer patch. This is more of a feature than a bug.

History

#1 Updated by Phil Miller about 2 years ago

  • Tags changed from #rdma, charmxi to #rdma, charmxi, SDAG

#2 Updated by Nitin Bhat about 2 years ago

  • Description updated (diff)
  • Assignee set to Nitin Bhat

#3 Updated by Nitin Bhat about 2 years ago

  • Priority changed from Normal to High
  • Status changed from New to In Progress

I have finished implementing the SDAG code for rdma entry methods. The usecase for this is the stencil3d example where receiveGhosts is written with ghosts being an rdma parameter. I have written this example and it works when run without a load balancer. (i.e. without migration)

I am currently working on a migration related bug that I'm seeing when I run the stencil3d example with a load balancer. The bug occurs when the callback is not called in the iteration after the load balancing iteration for the migrated chare.

I am planning to commit the SDAG support code (xi) and migration example (stencil3d) as two separate commits.

#4 Updated by Phil Miller about 2 years ago

This may be a rather naive question, but why is the receive side code (as suggested by the mention of closure and unmarshalling code) at all affected by the send-side RDMA? Shouldn't the method ultimately receive a CkMarshallMessage* as usual?

#5 Updated by Nitin Bhat about 2 years ago

  • Status changed from In Progress to Implemented
  • % Done changed from 0 to 100

The receive side code changes because we need to follow the same order in unmarshalling as we did in marshalling i.e the numrdmaops is packed first, followed by the rdmawrappers and then all the other parameters. The unmarshalling code should also follow this.

Additionally, the closure structure changes based on CMK_ONESIDED_IMPL (and so does its marshalling/unmarshalling code). It has num_rdma_fields and CkRdmawrappers when there is support and regular pointers otherwise.

Implementation : https://charm.cs.illinois.edu/gerrit/#/c/2572/
I have also added a 'rdma' version of stencil3d that has both SDAG and migration. That is included in the patch.

#6 Updated by Phil Miller about 2 years ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF