Feature #1497: Shared memory method to pass data between processes that share the same node
Enable use of shm transport for regular messages in LRTS
Experimenting with different models has shown that CMA (Cross Memory Attach) is a good candidate for exploiting shm for within-host communication. Shm transport over CMA has already been implemented for the Nocopy Direct API. Having an LRTS based implementation can greatly improve intra-host inter-process performance for large messages (regular, parameter marshalled) across all LRTS based layers.
- Subject changed from Enable use of pxshm on mpi and verbs builds to Enable use of pxshm/xpmem on mpi and verbs builds
Also, './build charm++ gni-crayxe xpmem' fails to build because it tries to build pxshm and xpmem both. The issue is that we build with pxshm by default for gni-crayx* builds and don't disable that when explicitly building with xpmem. From what I've seen, xpmem offers performance nearly on par with user-space memcpy for Cray MPI, so that could potentially become the default on gni builds instead of pxshm if we implement it correctly. The key is to call xpmem_make() on the entire virtual address space during startup, avoiding the high cost of memory registration/deregistration during runtime.
#5 Updated by Nitin Bhat about 1 month ago
- Subject changed from Enable use of pxshm/xpmem on mpi, ofi, and verbs builds to Enable use of shm transport for regular messages in LRTS
- Tags set to #lrts
Using CMA, we don't need a layer dependent shm implementation and can have a generic implementation in the LRTS layer.
#8 Updated by Nitin Bhat about 1 month ago
There are three LRTS based use cases for shm (using CMA) to be used for intra-host communication:
1. Large messages using the Nocopy Direct API : https://charm.cs.illinois.edu/redmine/issues/1667. (Already implemented)
2. Large messages using the Nocopy Entry Method API : https://charm.cs.illinois.edu/redmine/issues/1657
3. Large messages using regular API : (This feature).