Project

General

Profile

Bug #1940

Bug #259: Bugs exposed by use of randomized Q

Singleton chare and nodegroup creation hangs with randomized queues in SMP mode

Added by Sam White 11 months ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Category:
-
Target version:
Start date:
07/05/2018
Due date:
% Done:

10%

Spent time:

Description

examples/charm++/fib hangs in SMP mode when using randomized queues.

This issue and the nodegroup one can also be reproduced here: https://github.com/yuchenp/smp-rq-problem

History

#1 Updated by Eric Bohm 8 months ago

  • Assignee set to Michael Robson

#2 Updated by Michael Robson 7 months ago

  • Target version set to 6.9.1

#3 Updated by Sam White 4 months ago

  • Target version changed from 6.9.1 to 6.10.0

#4 Updated by Laxmikant "Sanjay" Kale 2 months ago

Michael, make at least some update (test with a simple program on a couple of machines) by next week.

#5 Updated by Michael Robson about 2 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10
Tried to replicate using fib on various platforms and machines:
  • netlrts-darwin on local machine with ++local - works (doesn't replicate aka hang)
  • netlrts-linux on courage - failed to build due to arbitrary priority (bitvec)
  • netlrts-linux on courage with prio=int - replicates (i.e. hangs)
  • verbs on comet - works (doesn't replicate) with ++local, mpi-exec, and standalone
  • mpi on comet and courage - failed to build
  • mpi on courage - replicates

#6 Updated by Sam White about 2 months ago

What do you mean by it wouldn't build for netlrts and mpi? Charm didn't build, or the example didn't build? Can you post the output?

#7 Updated by Michael Robson about 2 months ago

In both cases, charm failed to build. In the first (netlrts) case adding a fixed width priority (e.g. int) enabled charm to build. I don't have the output but I can recreate it and post it here.

Here's the build line from comet:

./build charm++ mpi-linux-x86_64 smp --enable-randomized-msgq -j8

And the build error:

checking "whether C++ compiler supports C++11 with '-h std=c++11'"... "no" 
Charm++ requires C++11 support, but doesn't know the flag to enable it

For Intel's compiler please see
https://charm.cs.illinois.edu/redmine/issues/1560
about making a suitable version of gcc/g++/libstdc++ available

For Blue Gene/Q please use the Clang compiler
*** Please find detailed output in tmp/charmconfig.out ***
gmake[1]: Leaving directory `/home/mprobson/charm/mpi-linux-x86_64-smp/tmp'
gmake: *** [headers] Error 2
-------------------------------------------------
Charm++ NOT BUILT. Either cd into mpi-linux-x86_64-smp/tmp and try
to resolve the problems yourself, visit
http://charm.cs.illinois.edu/
for more information. Otherwise, email the developers at charm@cs.illinois.edu

Turns out for courage it was also dying on the priotype incompatbility. Changing it from the default of bitvec to int fixes the problem and replicates the hang.

#8 Updated by Michael Robson about 2 months ago

With some further testing, this actually appears to be an error due to the combination of SMP mode and non-bitvec/fixed length priorities, which we are forced to use with randomized queues.

Also available in: Atom PDF