AMPI/CUDA: Synchronous and asynchronous invocation, some cleanup