29 . Malleability: Shrink/Expand Number of Processors

This feature enables a Charm++ application to dynamically shrink and expand the number of processors that it is running on during the execution. It internally uses three other features of Charm++ : CCS (Converse Client Server) interface, load balancing, and checkpoint restart. An example program with a CCS client to send shrink/expand commands can be found in examples/charm++/shrink_expand in the charm distribution. To enable shrink expand, Charm++ needs to be built with the -enable-shrinkexpand option:

  ./build charm++ netlrts-linux-x86_64 -enable-shrinkexpand

An example application launch command needs to include a load balancer, a nodelist file that contains all of the nodes that are going to be used, and a port number to listen the shrink/expand commands:

  ./charmrun +p4 ./jacobi2d 200 20 +balancer GreedyLB ++nodelist ./mynodelistfile ++server ++server-port 1234

The CCS client to send shrink/expand commands needs to specify the hostname, port number, the old(current) number of processor and the new(future) number of processors:


  ./client <hostname> <port> <oldprocs> <newprocs>
 (./client valor 1234 4 8 //This will increase from 4 to 8 processors.)

To make a Charm++ application malleable, first, pup routines for all of the constructs in the application need to be written. This includes writing a pup routine for the mainchare and marking it migratable:


  mainchare [migratable]  Main  ... 

Second, the AtSync() and ResumeFromSync() functions need to be implemented in the usual way of doing load balancing (See Section  7.2 for more info on load balancing). Shrink/expand will happen at the next load balancing step after the receipt of the shrink/expand command.

NOTE: If you want to shrink your application, for example, from two physical nodes to one node where each node has eight cores, then you should have eight entries in the nodelist file for each node, one per processor. Otherwise, the application will shrink in a way that will use four cores from each node, whereas what you likely want is to use eight cores on only one of the physical nodes after shrinking. For example, instead of having a nodelist like this:


  host a
 host b

the nodelist should be like this:


  host a
 host a
 host a
 host a
 host a
 host a
 host a
 host a
 host b
 host b
 host b
 host b
 host b
 host b
 host b
 host b

Warning: this is an experimental feature and not supported in all charm builds and applications. Currently, it is tested on netlrts-{linux/darwin}-x86_64 builds. Support for other Charm++ builds and AMPI applications are under development. It is only tested with RefineLB and GreedyLB load balancing strategies; use other strategies with caution.