Feature #1860

Support HostBuffer shared memory allocation of one buffer per physical host at same address on all hosts

Added by Eric Bohm 3 months ago. Updated 3 months ago.

Target version:
Start date:
Due date:
% Done:



Some applications would like to have multiple processes per host (usually for NUMA and comm thread motivations) but shave the memory footprint control of a memory buffer allocated only once per physical address space, yet accessible from all objects. Akin to a readonly, or a member of a nodegroup, but per host.

Lets call this HostBuffer (name subject to change). The runtime would provide the following semantics:
  1. Allocate a user defined buffer during startup
  2. Allocation done during charm launch at an address that we can guarantee to be the same on every host.
  3. Made visible to the user application before the construction of ReadOnly variables (and other charm objects).
  4. Buffer is assumed to be unregistered
    1. Registration could be provided as an option. I expect there are use cases (e.g., access to a table of readonly data from GPU) for it.

Implementation would require shared memory support, but would not require, or support, mutexes or other memory consistency semantics.
The expectation is that the user application would initialize that buffer once and read from it as necessary. They are free to do whatever read modify write usage they want, but we provides no consistency guarantees. The user is responsible for consistency and is free to use whatever means they like. We only guarantee that it gets allocated at the same address on every host and that address is visible on all PEs.

This should be easy to implement on linux using MMAP and some launch time trickery to make sure we provide the same base address. Lightweight kernels may require a different approach if they do not support MMAP.

Note: the guarantee of having a unified global base address may be overkill. The motivation for it is that it would allow the application to reference pointer structures within that space across hosts without having to rederive the offset from one host to the next. So pointers into that region could be sent without pupping. The simple use case of "shared buffer per host" does not require that.

We could also support pass through of various flags to mmap, though the more of that we do the less clear it is that this should exist as a charm feature when the user could just use MMAP themselves.


#1 Updated by Michael Robson 3 months ago

Just a stray idea from group meeting discussion, but we should keep in mind the idea of backing the hostBuffer with various types of memory (pinned host memory for the GPU or actual GPU device memory) and the ability to split this into sections (in case of NUMA, separate GPUs, etc)

Also available in: Atom PDF