Feature #1393: Redesign of GPUManager to utilize concurrent kernel execution and stream callbacks
Refactor CUDA example programs to fit new GPUManager design
CUDA example programs under
examples/charm++/cuda need to be refactored with the new design of GPUManager.
Especially, lazy host memory allocation & deallocation is no longer possible due to the removal of polling,
and those calls should be replaced by either