Bug #1464

Feature #1393: Redesign of Hybrid API (GPU Manager) to support concurrent kernel execution

CUDA example programs hang when run with 1 PE

Added by Jaemin Choi about 1 year ago. Updated 5 months ago.

In Progress
GPU Support
Target version:
Start date:
Due date:
% Done:



CUDA example programs (overlapTest, concurrentKernels, callbacks, etc.) hang when they are run with only 1 PE.
For example: ./charmrun +p1 ++local ./overlapTest

After some debugging, there seems to be a problem with CmiPushPE(), because although CUDACallback function gets invoked,
functions that actually invoke the user's callback functions such as hostToDeviceCallback(), kernelCallback(), and deviceToHostCallback() do not get called.
This is where the hang occurs.


#1 Updated by Jaemin Choi about 1 year ago

Thought it might be because of the handler functions indices, so wrapped and moved out CmiRegisterHandler() calls to ck-core/init.C as registerCallbacks().
So this change still works for # of PEs > 1, and it's good because the functions are no longer registered at every hapi_enqueue() call, but did not solve the 1 PE problem.
It runs fine in SMP mode, but this might be due to the presence of the comm thread (which is still weird).

#2 Updated by Eric Bohm 6 months ago

  • Target version changed from 6.8.1 to 6.9.0

#3 Updated by Jaemin Choi 5 months ago

  • Target version deleted (6.9.0)

Low priority, as the default mode of execution is CUDA events and not CUDA callbacks.
Will look into this issue again, though.

Also available in: Atom PDF