Feature #1459

Feature #1234: Avoid sender-side copy for large contiguous messages. API for charm and converse layers

Zero-copy send support for the netlrts machine layer

Added by Phil Miller about 1 year ago. Updated 6 months ago.

In Progress
Machine Layers
Target version:
Start date:
Due date:
% Done:



In the netlrts machine layer, it's pretty easy to stream data from an arbitrary address to a remote recipient on request. We can use this to implement zero-copy sends for the memory footprint reduction, and to take copying time overhead off the worker threads and onto the comm thread.


#1 Updated by Phil Miller about 1 year ago

  • Target version set to 6.8.0

#2 Updated by Phil Miller about 1 year ago

  • Tags changed from lrts netlrts rdma to lrts machine-layers/netlrts rdma

#3 Updated by Phil Miller about 1 year ago

  • Tags changed from lrts machine-layers/netlrts rdma to lrts machine-layers netlrts rdma

#4 Updated by Eric Bohm about 1 year ago

  • Assignee set to Vipul Harsh

#5 Updated by Vipul Harsh about 1 year ago

  • Assignee changed from Vipul Harsh to Nitin Bhat

#6 Updated by Nitin Bhat about 1 year ago

  • Tags set to #rdma

#7 Updated by Nitin Bhat 12 months ago

  • Status changed from New to In Progress

The current netlrts layer (UDP) in machine-eth.c sends a Datagram header with every packet it sends.
For every packet, it creates a header at (char *ptr - DGRAM_HEADER_SIZE) and sends it over. The receiver uses the packet information to assemble packets together.

I am guessing that we’re able to use the DGRAM_HEADER_SIZE bytes before the packet ptr because of two reasons:
1. We know that the previous packet has been delivered? (using acks)
2. It is the Charm++ owned copied buffer and free from user intervention and will be freed after the message send.

But, if we’re to send a user buffer using this scheme, how do we send a header with every packet? (as we can’t touch it to send the header info at a negative offset)

#8 Updated by Phil Miller 12 months ago

  • Tags changed from #rdma to #rdma, #netlrts, #lrts, #machine-layers

We could do the packetization in a set aside buffer that we copy the user's data through as we send it. The key is to tightly constrain the size of that buffer, making it just large enough to get full network bandwidth. If the packet injection is synchronous, then that buffer only needs to hold 1 packet.

#9 Updated by Phil Miller 12 months ago

  • Target version changed from 6.8.0 to 6.8.1

#10 Updated by Sam White 8 months ago

  • Category set to Machine Layers
  • Target version changed from 6.8.1 to 6.9.0

#11 Updated by Sam White 6 months ago

  • Target version deleted (6.9.0)

Also available in: Atom PDF