Feature #1393: Redesign of GPUManager to utilize concurrent kernel execution and stream callbacks
CUDA example programs hang when run with 1 PE
CUDA example programs (
callbacks, etc.) hang when they are run with only 1 PE.
./charmrun +p1 ++local ./overlapTest
After some debugging, there seems to be a problem with
CmiPushPE(), because although
CUDACallback function gets invoked,
functions that actually invoke the user's callback functions such as
deviceToHostCallback() do not get called.
This is where the hang occurs.
#1 Updated by Jaemin Choi 18 days ago
Thought it might be because of the handler functions indices, so wrapped and moved out
CmiRegisterHandler() calls to
So this change still works for # of PEs > 1, and it's good because the functions are no longer registered at every
hapi_enqueue() call, but did not solve the 1 PE problem.
It runs fine in SMP mode, but this might be due to the presence of the comm thread (which is still weird).