Project

General

Profile

Cleanup #1454

Feature #1393: Redesign of Hybrid API (GPU Manager) to support concurrent kernel execution

GPUManager API change

Added by Jaemin Choi over 1 year ago. Updated about 2 months ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
GPU Support
Target version:
Start date:
03/03/2017
Due date:
% Done:

100%


Description

Making changes to current GPUManager API to provide a more uniform & segregated API (function calls now start with hapi_) and better usability to the user.
This also eliminates memory leaks arising from creating a workRequest itself and data structures inside it.

[API comparison]
Old

workRequest *matmul = new workRequest;
matmul->dimGrid = dim3(ceil((float)matrixSize / BLOCK_SIZE), ceil((float)matrixSize / BLOCK_SIZE));
matmul->dimBlock = dim3(BLOCK_SIZE, BLOCK_SIZE);
matmul->smemSize = 0;
matmul->nBuffers = 3;
matmul->bufferInfo = new DataInfo[matmul->nBuffers];

AInfo = &(matmul->bufferInfo[0]);
AInfo->transferToDevice = YES;
AInfo->transferFromDevice = NO;
AInfo->freeBuffer = YES;
AInfo->hostBuffer = h_A;
AInfo->size = size;

BInfo = &(matmul->bufferInfo[1]);
BInfo->transferToDevice = YES;
BInfo->transferFromDevice = NO;
BInfo->freeBuffer = YES;
BInfo->hostBuffer = h_B;
BInfo->size = size;

CInfo = &(matmul->bufferInfo[2]);
CInfo->transferToDevice = NO;
CInfo->transferFromDevice = YES;
CInfo->freeBuffer = YES;
CInfo->hostBuffer = h_C;
CInfo->size = size;

matmul->callbackFn = cb;
if (useCublas) {
  matmul->traceName = "blas";
  matmul->runkernel = run_BLAS_KERNEL;
}
else {
  matmul->traceName = "matmul";
  matmul->runkernel = run_MATMUL_KERNEL;
}

matmul->userData = new int(matrixSize);

enqueue(matmul);

New

workRequest *matmul = hapi_createWorkRequest();
dim3 dimGrid(ceil((float)matrixSize / BLOCK_SIZE), ceil((float)matrixSize / BLOCK_SIZE));
matmul->setExecParams(dimGrid, dimBlock);
matmul->addBufferInfo(-1, h_A, size, cudaMemcpyHostToDevice, 1);
matmul->addBufferInfo(-1, h_B, size, cudaMemcpyHostToDevice, 1);
matmul->addBufferInfo(-1, h_C, size, cudaMemcpyDeviceToHost, 1);
matmul->setCallback(cb);
if (useCublas) {
  matmul->setTraceName("blas");
  matmul->setRunKernel(run_BLAS_KERNEL);
}
else {
  matmul->setTraceName("matmul");
  matmul->setRunKernel(run_MATMUL_KERNEL);
}
matmul->setUserData(&matrixSize, sizeof(int));

hapi_enqueue(matmul);


Subtasks

Feature #1456: Add more stream callbacks for use after HToD transfer and kernel executionMergedJaemin Choi

History

#1 Updated by Jaemin Choi over 1 year ago

  • Status changed from In Progress to Feedback

Change pushed to gerrit for review.
[[https://charm.cs.illinois.edu/gerrit/#/c/2283/]]

#2 Updated by Michael Robson over 1 year ago

Buffer ID (-1) should be last param and set to -1 by default

Also, is there a way to mark copy both ways?

ints should be change bools

#3 Updated by Eric Bohm about 1 year ago

  • Target version changed from 6.8.1 to 6.9.0

#5 Updated by Sam White 6 months ago

  • Status changed from Feedback to Implemented
  • Tracker changed from Support to Cleanup

#6 Updated by Eric Bohm 5 months ago

Could we get a status update for the GPU issues? What is blocking progress on these?

#7 Updated by Jaemin Choi 3 months ago

  • Target version changed from 6.9.0 to 6.9.1

#8 Updated by Jaemin Choi 2 months ago

  • Target version changed from 6.9.1 to 6.9.0

#9 Updated by Jaemin Choi about 2 months ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF