Project

General

Profile

Bug #543

charmrun under causalft should respect ++local

Added by Phil Miller almost 5 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
Start date:
08/06/2014
Due date:
% Done:

0%


Description

When running a message logging (and possibly just online checkpoint/restart) FT net build of Charm++, charmrun will try to launch replacement processes to take the place of failures. In testing environments, it's convenient to run with ++local so that there's no need to SSH to remote nodes. When running the causalfttest targets, I found that charmrun tries to use ssh to launch the replacement, in spite of ++local.

Obviously, fixing this is not really helpful to production-like uses of the online FT builds, since losing a single process on a node is not really the kind of failure we expect. The testing use case is valuable, though.

History

#1 Updated by Eric Bohm over 4 years ago

  • Assignee set to Xiang Ni

#2 Updated by Phil Miller over 4 years ago

  • Assignee changed from Xiang Ni to Eric Mikida

#3 Updated by Eric Mikida over 3 years ago

  • Status changed from New to In Progress

Might involve some refactoring of charmrun. At the point where we restart a node, a lot of the stuff used in local startup has already been lost. Charmrun could probably use a bit of refactoring as it is anyways.

#4 Updated by Sam White about 1 year ago

  • Assignee deleted (Eric Mikida)
  • Status changed from In Progress to Closed

causalft and mlogft are not production features and have been broken since at least 64-bit IDs were merged for 6.8.0

Also available in: Atom PDF