Project

General

Profile

Bug #903

ckexit with interop hangs sometimes

Added by Nikhil Jain over 3 years ago. Updated 3 months ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
MPI Interoperability
Target version:
Start date:
12/01/2015
Due date:
% Done:

0%


Description

In some cases, calling ckexit hangs in non-smp and smp mode.

History

#1 Updated by Phil Miller over 3 years ago

  • Status changed from New to In Progress

Proposed patch https://charm.cs.illinois.edu/gerrit/913 with some planned revisions for thread safety. That work should happen Monday 12/7.

#2 Updated by Nikhil Jain over 3 years ago

  • Target version changed from 6.7.0 to 6.7.1

#3 Updated by Phil Miller over 3 years ago

Discussion w/ Nikhil:

Generate local immediate messages to run on the comm thread, taking the exitCount variable check off the comm thread's normal execution path entirely.

#4 Updated by Phil Miller over 3 years ago

The bug description mentions non-smp builds too. How does that come up?

#5 Updated by Sam White over 3 years ago

  • Category set to MPI Interoperability
  • Target version changed from 6.7.1 to 6.8.0

#6 Updated by Phil Miller over 2 years ago

  • Target version changed from 6.8.0 to 6.8.1

#7 Updated by Sam White almost 2 years ago

  • Assignee changed from Nikhil Jain to Eric Mikida

#8 Updated by Eric Bohm almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#9 Updated by Nikhil Jain almost 2 years ago

This issue was reproducible using examples/charm++/mpi-coexist - multirun_time code.

#10 Updated by Eric Mikida over 1 year ago

This issue, or a related one is now coming up in Charades as well. I still need to explore more, but for me its a hang much earlier on, but will also hang on exit for simple programs. For example, the following program (with a trivial main chare) hangs for netlrts-linux-x86_64 in SMP mode:

int main(int argc, char** argv) {
  CharmInit(argc, argv);    

  CharmLibExit();                                                               
  return 0;                                                                     
}

#11 Updated by Eric Mikida over 1 year ago

The example from the interop documentation (examples/charm++/user-driven-interop) also hangs in smp mode.

#12 Updated by Eric Mikida over 1 year ago

  • Target version changed from 6.9.0 to 6.9.1

#13 Updated by Sam White 6 months ago

  • Target version changed from 6.9.1 to 6.10.0

#14 Updated by Eric Mikida 4 months ago

user-driven-interop does not reproduce this bug, it was actually just an error in the test that prevents it from running in SMP mode period. The small example posted above also no longer reproduces the issue. The other interop examples in example are currently broken in general, and crash due to other reasons.

#15 Updated by Eric Mikida 3 months ago

After fixing up the MPI interop examples in mpi-coexist (https://charm.cs.illinois.edu/gerrit/c/charm/+/5051) this bug is reproducible again. At least on mpi-darwin-x86_64 smp builds.

#16 Updated by Eric Mikida 3 months ago

  • Status changed from In Progress to Implemented

I've cleaned up Nikhil's original patch (https://charm.cs.illinois.edu/gerrit/913) to fix this issue. It does not include the immediate messaging improvement that him and Phil alluded to above, but as that was intended to be an improvement to the bug fix, and not the bug fix itself, I think patch 913 should still be merged, rather than leave interop exit broken.

#17 Updated by Eric Mikida 3 months ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF