Project

General

Profile

Feature #1393

Redesign of Hybrid API (GPU Manager) to support concurrent kernel execution

Added by Jaemin Choi over 2 years ago. Updated 10 months ago.

Status:
Merged
Priority:
High
Assignee:
Category:
GPU Support
Target version:
Start date:
02/28/2017
Due date:
% Done:

83%


Description

The original design of GPUManager had two data transfer streams and one kernel stream per GPUManager instance, which did make use of the memory copy engines on the GPU but failed to utilize the concurrent kernel execution feature.
More importantly, it relied on an inefficient polling scheme, where the scheduler periodically invokes a function that handles workRequests sitting in a queue and blocks the CPU until all relevant work for the workRequest at the head of the queue is complete. While it did allow overlap of data transfer and kernel execution by handling multiple workRequests in the queue, it was limited by the one kernel stream and synchronization caused by the polling scheme.

The new design is to utilize multiple kernel streams per GPUManager instance to allow concurrent kernel execution (multiple kernels may execute simultaneously, as long as one kernel is not using all resources of the GPU), and to make use of stream callbacks (supported from CUDA 5.0) to remove the CPU blocking caused by the current mechanism. The user's point of view does not change; he/she creates workRequests, calls enqueue() and does some other useful work on the CPU before the callback function is invoked, just like before.


Subtasks

Feature #1450: Clean up and add CUDA example programsMergedJaemin Choi

Feature #1451: NVTX integration for profilingMergedJaemin Choi

Cleanup #1454: GPUManager API changeMergedJaemin Choi

Feature #1456: Add more stream callbacks for use after HToD transfer and kernel executionMergedJaemin Choi

Bug #1464: CUDA example programs hang when run with 1 PEClosedJaemin Choi

Documentation #1491: Update GPUManager documentationMergedJaemin Choi

Support #1761: Update GPU Manager Tracing APIIn ProgressJaemin Choi

History

#1 Updated by Sam White over 2 years ago

  • Target version set to 6.8.1

#2 Updated by Jaemin Choi over 2 years ago

  • Status changed from In Progress to Implemented

Up for gerrit review.

#3 Updated by Jaemin Choi over 2 years ago

Previous gerrit commit split into multiple smaller ones.
New gerrit commit: [[https://charm.cs.illinois.edu/gerrit/#/c/2274/]]

#4 Updated by Eric Bohm almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#5 Updated by Jaemin Choi almost 2 years ago

  • Subject changed from Redesign of GPUManager to utilize concurrent kernel execution and stream callbacks to Redesign of Hybrid API (GPU Manager) to support concurrent kernel execution

Includes 2 schemes: CUDA event-based & CUDA callback-based, but the default is CUDA event-based due to better performance.
New gerrit patch will be up soon.

#7 Updated by Jaemin Choi 11 months ago

  • Target version changed from 6.9.0 to 6.9.1

#8 Updated by Jaemin Choi 11 months ago

  • Target version changed from 6.9.1 to 6.9.0

#9 Updated by Jaemin Choi 10 months ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF