Project

General

Profile

Bug #647

Make MeshStreamer classes [migratable] to support checkpoint/restart

Added by Phil Miller over 4 years ago. Updated about 2 years ago.

Status:
Merged
Priority:
High
Category:
Fault Tolerance
Target version:
Start date:
01/20/2015
Due date:
% Done:

50%

Tags:

Description

Right now, the various chare classes defined in NDMeshStreamer.h lack the [migratable] attribute. This means that programs using TRAM will fail to checkpoint. They should have the attribute and matching pup methods added.

History

#1 Updated by Phil Miller about 4 years ago

  • Assignee changed from Lukasz Wesolowski to PPL

#2 Updated by Eric Bohm almost 4 years ago

  • Assignee changed from PPL to Vipul Harsh

#3 Updated by Harshitha Menon almost 4 years ago

There are couple of examples for using TRAM in examples/charm++/TRAM. This does not contain checkpoint restart though.

#4 Updated by Vipul Harsh over 3 years ago

  • Status changed from New to In Progress

#5 Updated by Sam White over 2 years ago

  • Category set to Fault Tolerance
  • Status changed from In Progress to New
  • Assignee changed from Vipul Harsh to Karthik Senthil
  • Target version set to 6.8.0

#6 Updated by Phil Miller about 2 years ago

  • Priority changed from Normal to High

#7 Updated by Michael Robson about 2 years ago

  • Tags set to changa

#8 Updated by Karthik Senthil about 2 years ago

I have completed adding PUP functions for all the involved classes in VirtualRouter.h and for the MeshStreamer base class and GroupMeshStreamer class in NDMeshStreamer.h

I'm a bit unsure about the technique to handle PUP for the data member : std::vector<std::vector<MeshStreamerMessage<dtype> * > > dataBuffers_. Here MeshStreamerMessage is a Charm++ message object templated on a datatype(dtype).

I've implemented the unpacking operation for this as follows:

if (p.isUnpacking()) {
      dataBuffers_.resize(numDimensions_);
      for (int i = 0; i < numDimensions_; i++) {
          dataBuffers_[i].assign(myRouter_.numBuffersPerDimension(i), (MeshStreamerMessage<dtype> *) NULL);
    }
}

This is similar to the operation done in the constructor(MeshStreamer::ctorHelper). Is this correct? I don't get any compiler or runtime errors while testing available TRAM examples/tests on my machine. Also, what should be the packing operation on this data member?

#9 Updated by Sam White about 2 years ago

  • Status changed from New to In Progress

#10 Updated by Karthik Senthil about 2 years ago

  • % Done changed from 0 to 50

MeshStreamer base class and GroupMeshStreamer are done. Currently working on ArrayMeshStreamer and GroupChunkMeshStreamer classes.

#11 Updated by Sam White about 2 years ago

  • Status changed from In Progress to Implemented

#12 Updated by Phil Miller about 2 years ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF