Project

General

Profile

Bug #1905

pami* autobuilds failing since C -> C++ conversion

Added by Sam White 11 days ago. Updated 5 days ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
05/11/2018
Due date:
% Done:

0%


Description

All pami{lrts}-bluegeneq-* autobuilds have been failing since the C -> C++ conversion was done last week.
The last test (bigsim) appears to run okay, but the tests exit with a failure:

make[3]: Entering directory `/gpfs/vesta-fs0/projects/CharmRTS/nbhat/autobuild/pamilrts-clang-nosmp/charm/pamilrts-bluegeneq/examples/bigsim/emulator'
make -C littleMD
make[4]: Entering directory `/gpfs/vesta-fs0/projects/CharmRTS/nbhat/autobuild/pamilrts-clang-nosmp/charm/pamilrts-bluegeneq/examples/bigsim/emulator/littleMD'
make[4]: Nothing to be done for `all'.
make[4]: Leaving directory `/gpfs/vesta-fs0/projects/CharmRTS/nbhat/autobuild/pamilrts-clang-nosmp/charm/pamilrts-bluegeneq/examples/bigsim/emulator/littleMD'
../../../bin/testrun  +p4 ./maxReduce +cth3 +wth10 +bgstacksize 102400 --pernode 8

Running on 4 processors:  ./maxReduce +cth3 +wth10 +bgstacksize 102400
runjob -p 8 -n 4 --block VST-02460-13571-32 --envs BG_SHAREDMEMSIZE=32MB  :  ./maxReduce +cth3 +wth10 +bgstacksize 102400
Choosing optimized barrier algorithm name I0:MultiSync2Device:SHMEM:GI
Charm++> Running in non-SMP mode: numPes 4
Converse/Charm++ Commit ID: e264b88
Charm++> Disabling isomalloc because isomalloc disabled by conv-mach.
BG info> Simulating 3x3x3 nodes with 3 comm + 10 work threads each.
BG info> Network type: bluegene.
alpha: 1.000000e-07    packetsize: 1024    CYCLE_TIME_FACTOR:1.000000e-03.
CYCLES_PER_HOP: 5    CYCLES_PER_CORNER: 75.
BG info> cpufactor is 1.000000.
BG info> floating point factor is 0.000000.
BG info> BG stack size: 102400 bytes. 
BG info> Using WallTimer for timing method. 
Initializing node 0, 1, 0
Initializing node 1, 2, 0
Initializing node 2, 0, 0
Initializing node 1, 0, 0
Initializing node 0, 2, 0
Initializing node 2, 1, 0
Initializing node 1, 0, 1
Initializing node 2, 0, 1
Initializing node 0, 0, 1
Initializing node 2, 1, 1
Initializing node 1, 1, 1
Initializing node 0, 0, 2
Initializing node 0, 0, 0
Initializing node 0, 2, 1
Initializing node 1, 1, 2
Initializing node 2, 2, 1
Initializing node 2, 2, 2
Initializing node 0, 1, 2
Initializing node 1, 1, 0
Initializing node 1, 2, 2
Finished Initializing 2 0 0!
Finished Initializing 1 0 0!
Finished Finding Max
Initializing node 1, 0, 2
Initializing node 2, 2, 0
Sent reduce message to myself with max value 9
Finished Finding Max
Initializing node 2, 1, 2
Finished Initializing 0 2 0!
Sent reduce message to myself with max value 9
Finished Finding Max
Initializing node 0, 1, 1
Finished Initializing 2 1 0!
Finished Initializing 0 1 0!
Finished Finding Max
Initializing node 1, 2, 1
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Initializing 0 0 1!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Initializing 1 2 0!
Sent reduce message to myself with max value 9
Sent reduce message to myself with max value 9
Initializing node 2, 0, 2
Finished Initializing 1 1 1!
Initializing node 0, 2, 2
Finished Finding Max
Finished Initializing 1 0 1!
Finished Initializing 0 0 0!
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Initializing 2 2 1!
Sent reduce message to myself with max value 9
Sent reduce message to myself with max value 9
Finished Initializing 2 1 1!
Finished Finding Max
Finished Initializing 2 0 1!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Finding Max
Finished Initializing 1 1 0!
Sent reduce message to myself with max value 9
Finished Initializing 0 1 2!
Finished Initializing 0 2 1!
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Initializing 0 0 2!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Initializing 2 2 0!
Finished Initializing 1 2 2!
Finished Initializing 1 0 2!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Initializing 1 1 2!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Finding Max
Sent reduce message to myself with max value 9
Sent reduce message to myself with max value 9
Finished Initializing 2 2 2!
Finished Finding Max
Finished Initializing 0 1 1!
Finished Finding Max
Finished Finding Max
Sent reduce message to myself with max value 9
Sent reduce message to myself with max value 9
Finished Initializing 2 1 2!
Sent reduce message to myself with max value 9
Finished Finding Max
Finished Initializing 1 2 1!
Sent reduce message to myself with max value 9
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Initializing 2 0 2!
Finished Finding Max
Sent reduce message to myself with max value 9
Finished Initializing 0 2 2!
Finished Finding Max
Sent reduce message to myself with max value 9
The maximal value is 9 

BG> BigSim emulator shutdown gracefully!
BG> Emulation took 0.044622 seconds!
[Partition 0][Node 0] End of program
fatal> error code 1 during remote> ./instead_test.sh charm/pamilrts-bluegeneq/tmp make  test --pernode 8
Returned from executing scripts/pamilrts-bluegeneq/test on remote host
fatal> Test on remote host failed with fatal error (0)
Bad: Test on remote host failed with fatal error (0)

History

#1 Updated by Evan Ramos 10 days ago

If you set a breakpoint on exit(), does it get hit, and if so what is the backtrace?

#2 Updated by Sam White 10 days ago

It looks like verbs-linux-x86_64-smp failed in a bigsim test last night: http://charm.cs.illinois.edu/autobuild/cur/verbs-linux-x86_64-smp.txt

#3 Updated by Nitin Bhat 5 days ago

  • Assignee set to Nitin Bhat

I found that autobuild began failing because of the addition of new tests which caused the total execution time to go over 2:00:00 hours, which is the maximum execution time for the queue. When I run tests separately, bigsim tests run smoothly on pami{lrts}-bluegeneq.

So, ideally, the bug fix for this bug is to support pami builds to run smoothly even when execution time is > 2 hours.

#4 Updated by Evan Ramos 5 days ago

2 hours seems like a long time for the tests to run. Maybe #1872 describes the problem?

#5 Updated by Sam White 5 days ago

This may be due to examples/ampi/pingpong recently being added to 'make test' and the fact that AMPI pingpong is very slow on BGQ. I'll submit a patch to reduce the number of iterations that it runs.

#6 Updated by Sam White 5 days ago

  • Status changed from New to In Progress

#7 Updated by Sam White 5 days ago

Make examples/ampi/pingpong/ run faster, especially on BGQ: https://charm.cs.illinois.edu/gerrit/#/c/charm/+/4195/

Also available in: Atom PDF