Project

General

Profile

Bug #159

Some CkCallback types are not valid across checkpoint/restart

Added by Phil Miller about 6 years ago. Updated about 1 month ago.

Status:
New
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
04/02/2013
Due date:
% Done:

0%


Description

Per #158, many types of CkCallback contain (possibly transitively, through other structures) raw pointers to objects in the system, like chares (via CkChareID) and functions. These callbacks cannot survive recovery from the kind of application-level checkpoints that Charm++ performs, because their targets may have changed in address from one execution to the next. In the chare case, we can potentially use a less transient identifier like chareIdx if that's stable and usable across restart. If chares get folded into the fixed-size global object ID work (#108), then that will apply to callbacks as well, and this will be fixed.

I'm less sure how to handle functions. It might be possible to have them registered explicitly and referenced by some ID instead of by pointer, but I'm uncertain whether that would actually work in the restart case either, unless the registration were in some very low-level code run at every process launch. If initnode calls happen even during restart, then that may suffice, but whoever works on this would have to check this pretty carefully.


Related issues

Related to Charm++ - Bug #2018: Use of function pointers causes CkCallback errors in some ASLR environments New 10/26/2018
Related to Charm++ - Cleanup #1422: Cleanup dangling issues from 64bit merge In Progress 02/15/2017

History

#1 Updated by Eric Bohm over 5 years ago

  • Assignee set to Jonathan Lifflander

Parcel out sub components of this task as needed.

#2 Updated by Phil Miller over 5 years ago

  • Target version changed from 6.6.0 to 6.7.0

#3 Updated by Eric Bohm over 4 years ago

  • Priority changed from Normal to Low

#4 Updated by Phil Miller over 3 years ago

  • Target version changed from 6.7.0 to 6.8.0

#5 Updated by Phil Miller over 2 years ago

  • Assignee changed from Jonathan Lifflander to Phil Miller

#6 Updated by Phil Miller about 2 years ago

  • Target version changed from 6.8.0 to 6.8.1

Not seeming to affect any current applications, so deferring.

#7 Updated by Eric Bohm over 1 year ago

  • Target version changed from 6.8.1 to 6.9.0

#8 Updated by Phil Miller over 1 year ago

Eric, Ronak: what's the status of using 64-bit IDs to name plain chares?

#9 Updated by Eric Mikida over 1 year ago

I did some exploration to get this integrated, and to get singleton chares ID fully updated to 64bit ID would take a lot of work due to the number of different chare IDs already used in various different places and the fact that they aren't even always used as just pure IDs. A quick and dirty fix to get 64bit IDs for every singleton chare is more doable, but I'm not sure how worthwhile it would be, and may necessitate multiple API changes to add another ID to a chare as a temporary fix, and then later re-update the API to remove the other obsolete IDs.

If this particular bug is critical, the plain IDs could be added quickly to (maybe) address this if it is worth it.

Ronak may have more input?

#10 Updated by Phil Miller over 1 year ago

  • Assignee deleted (Phil Miller)

#11 Updated by Eric Bohm over 1 year ago

  • Target version changed from 6.9.0 to 6.9.1

#12 Updated by Eric Bohm about 1 year ago

  • Assignee set to Juan Galvez

#13 Updated by Sam White 4 months ago

  • Target version changed from 6.9.1 to 6.10.0

#14 Updated by Evan Ramos about 2 months ago

  • Related to Bug #2018: Use of function pointers causes CkCallback errors in some ASLR environments added

#15 Updated by Juan Galvez about 2 months ago

Regarding callbacks that use function pointers, we concluded in core meeting that we will disallow this use of callbacks with checkpoint/restart. Instead, user should use entry method callback.

#16 Updated by Eric Mikida about 2 months ago

  • Related to Cleanup #1422: Cleanup dangling issues from 64bit merge added

#17 Updated by Eric Bohm about 1 month ago

  • Target version changed from 6.10.0 to 6.11

Also available in: Atom PDF