Some CkCallback types are not valid across checkpoint/restart
Per #158, many types of
CkCallback contain (possibly transitively, through other structures) raw pointers to objects in the system, like chares (via
CkChareID) and functions. These callbacks cannot survive recovery from the kind of application-level checkpoints that Charm++ performs, because their targets may have changed in address from one execution to the next. In the chare case, we can potentially use a less transient identifier like
chareIdx if that's stable and usable across restart. If chares get folded into the fixed-size global object ID work (#108), then that will apply to callbacks as well, and this will be fixed.
I'm less sure how to handle functions. It might be possible to have them registered explicitly and referenced by some ID instead of by pointer, but I'm uncertain whether that would actually work in the restart case either, unless the registration were in some very low-level code run at every process launch. If
initnode calls happen even during restart, then that may suffice, but whoever works on this would have to check this pretty carefully.
#9 Updated by Eric Mikida over 1 year ago
I did some exploration to get this integrated, and to get singleton chares ID fully updated to 64bit ID would take a lot of work due to the number of different chare IDs already used in various different places and the fact that they aren't even always used as just pure IDs. A quick and dirty fix to get 64bit IDs for every singleton chare is more doable, but I'm not sure how worthwhile it would be, and may necessitate multiple API changes to add another ID to a chare as a temporary fix, and then later re-update the API to remove the other obsolete IDs.
If this particular bug is critical, the plain IDs could be added quickly to (maybe) address this if it is worth it.
Ronak may have more input?