Bug #259: Bugs exposed by use of randomized Q
Singleton chare and nodegroup creation hangs with randomized queues in SMP mode
examples/charm++/fib hangs in SMP mode when using randomized queues.
This issue and the nodegroup one can also be reproduced here: https://github.com/yuchenp/smp-rq-problem
#5 Updated by Michael Robson about 2 months ago
- Status changed from New to In Progress
- % Done changed from 0 to 10
- netlrts-darwin on local machine with ++local - works (doesn't replicate aka hang)
- netlrts-linux on courage - failed to build due to arbitrary priority (bitvec)
- netlrts-linux on courage with prio=int - replicates (i.e. hangs)
- verbs on comet - works (doesn't replicate) with ++local, mpi-exec, and standalone
- mpi on comet
and courage- failed to build
- mpi on courage - replicates
#7 Updated by Michael Robson about 2 months ago
In both cases, charm failed to build. In the first (netlrts) case adding a fixed width priority (e.g. int) enabled charm to build.
I don't have the output but I can recreate it and post it here.
Here's the build line from comet:
./build charm++ mpi-linux-x86_64 smp --enable-randomized-msgq -j8
And the build error:
checking "whether C++ compiler supports C++11 with '-h std=c++11'"... "no" Charm++ requires C++11 support, but doesn't know the flag to enable it For Intel's compiler please see https://charm.cs.illinois.edu/redmine/issues/1560 about making a suitable version of gcc/g++/libstdc++ available For Blue Gene/Q please use the Clang compiler *** Please find detailed output in tmp/charmconfig.out *** gmake: Leaving directory `/home/mprobson/charm/mpi-linux-x86_64-smp/tmp' gmake: *** [headers] Error 2 ------------------------------------------------- Charm++ NOT BUILT. Either cd into mpi-linux-x86_64-smp/tmp and try to resolve the problems yourself, visit http://charm.cs.illinois.edu/ for more information. Otherwise, email the developers at email@example.com
Turns out for courage it was also dying on the priotype incompatbility. Changing it from the default of bitvec to int fixes the problem and replicates the hang.