nocopy accelerated section multicast
It should be possible to reduce the number of copies required to implement a section multicast.
Especially if the entry method is receiving const data (i.e., readonly data). Ideally there should be only one copy per address space with refcounting to resolve cleanup when delivery has been completed at all leaves within the address space.
The current implementation creates one per PE, even for [readonly], and doesn't use RDMA to minimize pack/unpack cost.