Project

General

Profile

Bug #1934

Converse commbench crashes for mpi-linux-x86_64-smp build during nightly build tests

Added by Nitin Bhat 9 days ago. Updated 2 days ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
06/12/2018
Due date:
% Done:

0%


Description

Nightly build: http://ppl-jenkins:8080/job/Nightly-Build/label=xenial,platform=mpi-linux-x86_64-smp/1591/console

unning on 2 processors:  ./pgm +ppn 2 +pemap 0-5 
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 2  ./pgm +ppn 2 +pemap 0-5 
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 0 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 42773fd
proc module init
proc module init
proc module init
proc module init
proc module init
proc module init
Charm++> cpu affinity enabled. 
Charm++> cpuaffinity PE-core map : 0-5
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
[esteem:26119] *** Process received signal ***
[esteem:26119] Signal: Segmentation fault (11)
[esteem:26119] Signal code: Address not mapped (1)
[esteem:26119] Failing at address: (nil)
[esteem:26120] *** Process received signal ***
[esteem:26120] Signal: Segmentation fault (11)
[esteem:26120] Signal code: Address not mapped (1)
[esteem:26120] Failing at address: (nil)
[esteem:26119] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ffff7bcb390]
[esteem:26119] [ 1] ./pgm(commbench_init+0x274)[0x424da4]
[esteem:26119] [ 2] ./pgm[0x4328be]
[esteem:26119] [ 3] ./pgm[0x432a3c]
[esteem:26119] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7ffff7bc16ba]
[esteem:26119] [ 5] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ffff6b6541d]
[esteem:26119] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 26120 on node esteem exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Makefile:13: recipe for target 'test' failed
make[3]: Leaving directory '/scratch/jenkins/builds/Nightly-Build/label=xenial,platform=mpi-linux-x86_64-smp@1591/charm/mpi-linux-x86_64-smp/tests/converse/commbench'
make[3]: *** [test] Error 139
Makefile:9: recipe for target 'test' failed
make[2]: *** [test] Error 1
make[2]: Leaving directory '/scratch/jenkins/builds/Nightly-Build/label=xenial,platform=mpi-linux-x86_64-smp@1591/charm/mpi-linux-x86_64-smp/tests/converse'
Makefile:13: recipe for target 'test' failed

History

#2 Updated by Evan Ramos 8 days ago

  • Assignee set to Eric Mikida

#3 Updated by Evan Ramos 6 days ago

I don't think ddd93ef3a2c7e2f78a8fe9b2648e9edaf034e872 is the cause of the crash in commbench, because until recently the nightly build failure was due to the two-hour time limit, before tests/converse/commbench was even reached. Build #1581 (Jun 2, 2018) is the first one to crash after 9 minutes.

#4 Updated by Evan Ramos 6 days ago

I ran git bisect on the issue and found that it began with c99900c04d249d18c3dbc6c8747dd490c40f23ee / https://charm.cs.illinois.edu/gerrit/3478

#5 Updated by Evan Ramos 6 days ago

My git bisect script was:

#!/bin/bash

git clean -xdf
if [[ $? -ne 0 ]]; then exit 125; fi

./build AMPI mpi-linux-x86_64-smp --with-production --enable-error-checking -j4 -O0 -g3
if [[ $? -ne 0 ]]; then exit 125; fi

pushd mpi-linux-x86_64-smp/tests/converse/commbench
if [[ $? -ne 0 ]]; then exit 125; fi

make -j4 OPTS="-O0 -g3" 
if [[ $? -ne 0 ]]; then
    popd
    exit 125
fi

make test OPTS="-O0 -g3" TESTOPTS="++ppn 2 +pemap 0-5" 
if [[ $? -ne 0 ]]; then
    popd
    exit 1
fi

popd
exit 0

#6 Updated by Evan Ramos 6 days ago

  • Assignee changed from Eric Mikida to Nitin Bhat

#7 Updated by Nitin Bhat 2 days ago

  • Status changed from New to Implemented

#8 Updated by Nitin Bhat 2 days ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF