Project

General

Profile

Bug #1464

Feature #1393: Redesign of GPUManager to utilize concurrent kernel execution and stream callbacks

CUDA example programs hang when run with 1 PE

Added by Jaemin Choi about 2 months ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
GPU Support
Target version:
Start date:
03/08/2017
Due date:
% Done:

0%


Description

CUDA example programs (overlapTest, concurrentKernels, callbacks, etc.) hang when they are run with only 1 PE.
For example: ./charmrun +p1 ++local ./overlapTest

After some debugging, there seems to be a problem with CmiPushPE(), because although CUDACallback function gets invoked,
functions that actually invoke the user's callback functions such as hostToDeviceCallback(), kernelCallback(), and deviceToHostCallback() do not get called.
This is where the hang occurs.

History

#1 Updated by Jaemin Choi about 2 months ago

Thought it might be because of the handler functions indices, so wrapped and moved out CmiRegisterHandler() calls to ck-core/init.C as registerCallbacks().
So this change still works for # of PEs > 1, and it's good because the functions are no longer registered at every hapi_enqueue() call, but did not solve the 1 PE problem.
It runs fine in SMP mode, but this might be due to the presence of the comm thread (which is still weird).

Also available in: Atom PDF