- 6 . 1 Registering / Unregistering Memory for RDMA
- 6 . 2 RDMA operations (Get / Put)
- 6 . 3 Completion of RDMA operation
This chapter deals with one sided communication support in converse. It is imperative to provide a one-sided communication interface to take advantage of the hardware RDMA facilities provided by a lot of NIC cards. Drivers for these hardware provide or promise to soon provide capabilities to use this feature.
Converse provides an implementation which wraps the functionality provided by different hardware and presents them as a uniform interface to the programmer. For machines which do not have a one-sided hardware at their disposal, these operations are emulated through converse messages.
Converse provides the following types of operations to support one-sided communication.
int CmiRegisterMemory(void *addr, unsigned int size);
This function takes an allocated memory at starting address addr of length size and registers it with the hardware NIC, thus making this memory DMAable. This is also called pinning memory on the NIC hardware, making remote DMA operations on this memory possible. This directly calls the hardware driver function for registering the memory region and is usually an expensive operation, so should be used sparingly.
int CmiUnRegisterMemory(void *addr, unsigned int size);
This function unregisters the memory at starting address addr of length size , making it no longer DMAable. This operation corresponds to unpinning memory from the NIC hardware. This is also an expensive operation and should be sparingly used.
For certain machine layers which support a DMA, we support the function
void *CmiDMAAlloc(int size);
This operation allocates a memory region of length size from the DMAable region on the NIC hardware. The memory region returned is pinned to the NIC hardware. This is an alternative to CmiRegisterMemory and is implemented only for hardwares that support this.
- Hardware support for both Get and Put operations.
- Hardware support for one of the two operations, mostly for Put . For these the other RDMA operation is emulated by using the operation that is implemented in hardware and extra messages.
- No hardware support for any RDMA operation. For these, both the RDMA operations are emulated through messages.
There are two different sets of RDMA operations
- The first set of RDMA operations return an opaque handle to the programmer, which can only be used to verify if the operation is complete. This suits AMPI better and closely follows the idea of separating communication from synchronization. So, the user program needs to keep track of synchronization.
- The second set of RDMA operations do not return anything, instead they provide a callback when the operation completes. This suits nicely the charm++ framework of sending asynchronous messages. The handler(callback) will be automatically invoked when the operation completes.
For machine layer developer: Internally, every machine layer is free to create a suitable data structure for this purpose. This is the reason this has been kept opaque from the programmer.
void *CmiPut(unsigned int sourceId, unsigned int targetId, void *Saddr, void *Taadr, unsigned int size);
This function is pretty self explanatory. It puts the memory location at Saddr on the machine specified by sourceId to Taddr on the machine specified by targetId . The memory region being RDMA'ed is of length size bytes.
void *CmiGet(unsigned int sourceId, unsigned int targetId, void *Saddr, void *Taadr, unsigned int size);
Similar to CmiPut except the direction of the data transfer is opposite; from target to source.
void CmiPutCb(unsigned int sourceId, unsigned int targetId, void *Saddr, void *Taddr, unsigned int size, CmiRdmaCallbackFn fn, void *param);
Similar to CmiPut except a callback is called when the operation completes.
void CmiGetCb(unsigned int sourceId, unsigned int targetId, void *Saddr, void *Taddr, unsigned int size, CmiRdmaCallbackFn fn, void *param);
Similar to CmiGet except a callback is called when the operation completes.
A typical usage of this function would be in AMPI when there is a call to
AMPIWait. The implementation should call the CmiWaitTest for all
pending RDMA operations in that window.