Enable Charm++ to send RDMA-style messages to/from buffers on the GPU. Our initial plan is to use the RDMA syntax. The runtime should detect whether the source and/or target buffers are resident on a local or remote GPU and invoke the corresponding GPU Direct call. We will test this approach with benchmarks (e.g. stencil2d) against GPU-aware MPI versions on multi-GPU clusters (e.g. Bridges, Comet, etc) to ensure we have comparable performance.