Project

General

Profile

Feature #1378

64-bit Charm message sizes

Added by Sam White over 2 years ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
01/24/2017
Due date:
% Done:

0%


Description

Currently, PUP supports 64bit sizes, sizes of chares are 64bit, the GNI/Isomalloc mempool uses 64bit sizes, but Charm message envelopes use ints as sizes. So messaging anything with size greater than an int will currently fail. The messages sizes will need to be fixed throughout the runtime, in all machine layers for all messaging types.

I think that first we should add checks in message constructors to fail when trying to allocate something larger than the size of an int.

History

#1 Updated by Sam White over 2 years ago

Also, note that AMPI has a patch in gerrit for MPI-3 large count support, which should work once this Charm message size issue is fixed: https://charm.cs.illinois.edu/redmine/issues/1105

#2 Updated by Sam White over 2 years ago

Previously merged, related patches:

Use size_t instead of int for all PUP interfaces: https://charm.cs.illinois.edu/gerrit/#/c/1873/

Make sizes of chares and readonlies size_t instead of int: https://charm.cs.illinois.edu/gerrit/#/c/1903/

Convert thread data sizes to size_t instead of int: https://charm.cs.illinois.edu/gerrit/#/c/1913/

Convert GNI/Isomalloc mempool to use size_t instead of int for sizes: https://charm.cs.illinois.edu/gerrit/#/c/1912/

Convert CharmDebug to use 64-bit sizes for chares and readonlies: https://charm.cs.illinois.edu/gerrit/#/c/2356/

#3 Updated by Sam White over 2 years ago

  • Assignee set to Sam White

I have a half-baked implementation on multicore builds. Some of the machine layers already support 64 bit message sizes (PAMI, maybe others), and some will require small changes. The main changes are in Charm, Converse, and LRTS interfaces.

Note that this will require changes to CharmDebug (and possibly many other things).

#4 Updated by Sam White over 2 years ago

Here's a first pass at the netlrts/multicore/smp build. This builds and passes the tests/examples, but no attempt has been made for CharmDebug or any non-standard build options: https://charm.cs.illinois.edu/gerrit/#/c/2395/

We can probably make this more backwards compatible, but I didn't attempt that here yet.

#5 Updated by Sam White about 2 years ago

Abort before trying to migrate a chare with size greater than a message can hold: https://charm.cs.illinois.edu/gerrit/#/c/2425/

A next step for this would be to pipeline chares that are bigger than INT_MAX in multiple chunks that are up to INT_MAX bytes in size.

#6 Updated by Sam White over 1 year ago

  • Target version deleted (6.9.0)

#7 Updated by Sam White over 1 year ago

  • Assignee deleted (Sam White)
  • Priority changed from Normal to Low

#8 Updated by Eric Bohm over 1 year ago

  • Assignee set to Evan Ramos

I disagree with lowering the priority on this. The next round of machines is going to be fewer nodes with much larger memory footprint. Therefore, we are more likely to see very large messages in a variety of contexts.

#9 Updated by Sam White over 1 year ago

  • Priority changed from Low to Normal

I have implemented the basic support for this in netlrts and multicore builds here: https://charm.cs.illinois.edu/gerrit/#/c/2395/

That needs to be rebased, to have a test added for large messages, and other network layers would need development. I'm not sure if all network layers even support 64-bit messages. The MPI standard added some support for it to point-to-point messages in MPI-3.0. But perhaps we shouldn't even expose this to the network layers. Instead we can just pipeline messages with size > INT_MAX.

#10 Updated by Evan Ramos over 1 year ago

  • Status changed from New to In Progress

#11 Updated by Eric Bohm over 1 year ago

I don't think we want to expand to support 64bit sizes for all messages.

That would pointlessly blow up the size field for the 99% of messages that fit under uint32.

But we do want to support sending data that is larger than uint32.

That could be handled by pipelining. Or by having a different message header and different handlers for very large data. Or both.

Also available in: Atom PDF