Project

General

Profile

Feature #1468

Enable pre-pinning memory for RDMA message sends

Added by Sam White about 1 month ago. Updated about 22 hours ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/10/2017
Due date:
% Done:

0%

Tags:

Description

The cost of memory pinning on Verbs and GNI is high, so we'd like to enable users to pre-pin memory for use in later RDMA message sends. This is a step toward a persistent messaging API in that users can pin once and send multiple times.

We'd like to let the user allocate however they like, and then underneath we avoid re-pinning memory that is already pinned. Unfortunately, Verbs does not seem implement re-pinning of memory that is already pinned in a faster way, and doesn't seem to have a method to query the pinned-ness of some memory, so we'll have to track that ourselves. We can provide a CkAlloc routine that wraps something like infi_CmiAlloc() or that accepts a parameter that says whether memory should be pinned or not.

History

#1 Updated by Jaemin Choi 10 days ago

  • Status changed from New to In Progress

CmiAlloc() calls infi_CmiAlloc() underneath, which in turn calls getInfiCmiChunk() where the memory is actually pinned through ibv_reg_mr().
So this is not a memory pool per se, since pinning occurs at memory allocation time and not at program initialization time.
If this is fine, we could provide a CkAlloc() (which doesn't seem to exist currently) for the user to use when pinned memory is needed.

#2 Updated by Sam White 10 days ago

Yes I think that's what we want is a CkAlloc. We can eventually provide a pre-pinned memory pool behind that, but at first just pinning inside the call to CkAlloc is okay.

#3 Updated by Sam White about 22 hours ago

Phil suggested a complementary optimization on the GNI RDMA patch (https://charm.cs.illinois.edu/gerrit/#/c/1908/) where the runtime would lazily deregister memory. That is, register memory that is not already registered, keep a cache of pre-registered memory (with some configurable limit on the number of buffers and the total size of those buffers), and only de-register memory when that cache is full.

Also available in: Atom PDF