Project

General

Profile

Bug #1464

Feature #1393: Redesign of Hybrid API (GPU Manager) to support concurrent kernel execution

CUDA example programs hang when run with 1 PE

Added by Jaemin Choi 8 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
GPU Support
Target version:
Start date:
03/08/2017
Due date:
% Done:

0%


Description

CUDA example programs (overlapTest, concurrentKernels, callbacks, etc.) hang when they are run with only 1 PE.
For example: ./charmrun +p1 ++local ./overlapTest

After some debugging, there seems to be a problem with CmiPushPE(), because although CUDACallback function gets invoked,
functions that actually invoke the user's callback functions such as hostToDeviceCallback(), kernelCallback(), and deviceToHostCallback() do not get called.
This is where the hang occurs.

History

#1 Updated by Jaemin Choi 8 months ago

Thought it might be because of the handler functions indices, so wrapped and moved out CmiRegisterHandler() calls to ck-core/init.C as registerCallbacks().
So this change still works for # of PEs > 1, and it's good because the functions are no longer registered at every hapi_enqueue() call, but did not solve the 1 PE problem.
It runs fine in SMP mode, but this might be due to the presence of the comm thread (which is still weird).

#2 Updated by Eric Bohm about 1 month ago

  • Target version changed from 6.8.1 to 6.9.0

Also available in: Atom PDF