Feature #1393: Redesign of GPUManager to utilize concurrent kernel execution and stream callbacks
CUDA example programs hang when run with 1 PE
CUDA example programs (
callbacks, etc.) hang when they are run with only 1 PE.
./charmrun +p1 ++local ./overlapTest
After some debugging, there seems to be a problem with
CmiPushPE(), because although
CUDACallback function gets invoked,
functions that actually invoke the user's callback functions such as
deviceToHostCallback() do not get called.
This is where the hang occurs.
#1 Updated by Jaemin Choi about 2 months ago
Thought it might be because of the handler functions indices, so wrapped and moved out
CmiRegisterHandler() calls to
So this change still works for # of PEs > 1, and it's good because the functions are no longer registered at every
hapi_enqueue() call, but did not solve the 1 PE problem.
It runs fine in SMP mode, but this might be due to the presence of the comm thread (which is still weird).