Project

General

Profile

Bug #1858

tests/ampi/privatization fails on mpi-linux-x86_64-smp autobuild

Added by Sam White 12 days ago. Updated 4 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
Start date:
04/11/2018
Due date:
% Done:

0%


Description

Autobuild runs this test on the 'courtesy' lab machine:

$ ./build LIBS mpi-linux-x86_64 smp -j16 --with-production --enable-error-checking -g

$ cd mpi-linux-x86_64-smp/tests/ampi/privatization

$ make test OPTS=-g TESTOPTS=+isomalloc_sync

../../../bin/testrun  ./tlsglobals +p1 +vp2  +isomalloc_sync

Running on 1 processors:  ./tlsglobals +vp2 +isomalloc_sync
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 1  ./tlsglobals +vp2 +isomalloc_sync
Charm++> Running on MPI version: 2.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 3c99e0d
Charm++> Synchronizing isomalloc memory region...
Charm++> Consolidated Isomalloc memory region: 0x2ab700000000 - 0x6aa900000000 (67051520 MB).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0] CmiAbortHelper+0x5a  [0x61c20a]
  [1:1]   [0x61c28d]
  [1:2] TCHARM_Get_num_chunks+0x1fe  [0x4cd44e]
  [1:3] _Z14ampiCreateMainPFviPPcEPKci+0x16  [0x4e1bb6]
  [1:4] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x34  [0x543734]
  [1:5] _Z10_initCharmiPPc+0x2869  [0x54b7a9]
  [1:6]   [0x61f236]
  [1:7]   [0x61f3c8]
  [1:8] +0x8184  [0x2aaaaaeda184]
  [1:9] clone+0x6d  [0x2aaaabfa903d]

History

#1 Updated by Sam White 11 days ago

Also hanging on multicore-linux-x86_64

#2 Updated by Evan Ramos 10 days ago

Can also happen with netlrts-linux-x86_64-smp:

$ ./tlsglobals +vp2 +p2
Charm++: standalone mode (not using charmrun)
Charm++> Running in SMP mode: numNodes 1,  2 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-545-gaa6ff5636
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0] CmiAbortHelper+0xbe  [0x7822f1]
  [1:1] CmiGetNonLocal+0  [0x78232a]
  [1:2] TCHARM_Get_num_chunks+0x3c  [0x5efb0b]
  [1:3] _Z14ampiCreateMainPFviPPcEPKci+0x19  [0x603cfc]
  [1:4] AMPI_Setup_Switch+0x37  [0x6035b9]
  [1:5] TCHARM_Call_fallback_setup+0x19  [0x5efac1]
  [1:6] TCHARM_User_setup+0x9  [0x8120fd]
  [1:7] tcharm_user_setup_+0x9  [0x8120ea]
  [1:8] _ZN10TCharmMainC2EP8CkArgMsg+0x3f  [0x670731]
  [1:9] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x40  [0x67045e]
  [1:10] _Z10_initCharmiPPc+0x1f59  [0x6797c0]
  [1:11]   [0x78205a]
  [1:12] ConverseInit+0x73e  [0x781d18]
  [1:13] main+0x3f  [0x670a6c]
  [1:14] __libc_start_main+0xf0  [0x7ffff6d6d830]
  [1:15] _start+0x29  [0x5ed0a9]
Charm++ fatal error:
TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0]   [0x7830e5]
  [1:1] LrtsAbort+0x6f  [0x782a8a]
  [1:2] CmiAbort+0  [0x7822fd]
  [1:3] CmiGetNonLocal+0  [0x78232a]
  [1:4] TCHARM_Get_num_chunks+0x3c  [0x5efb0b]
  [1:5] _Z14ampiCreateMainPFviPPcEPKci+0x19  [0x603cfc]
  [1:6] AMPI_Setup_Switch+0x37  [0x6035b9]
  [1:7] TCHARM_Call_fallback_setup+0x19  [0x5efac1]
  [1:8] TCHARM_User_setup+0x9  [0x8120fd]
  [1:9] tcharm_user_setup_+0x9  [0x8120ea]
  [1:10] _ZN10TCharmMainC2EP8CkArgMsg+0x3f  [0x670731]
  [1:11] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x40  [0x67045e]
  [1:12] _Z10_initCharmiPPc+0x1f59  [0x6797c0]
  [1:13]   [0x78205a]
  [1:14] ConverseInit+0x73e  [0x781d18]
  [1:15] main+0x3f  [0x670a6c]
  [1:16] __libc_start_main+0xf0  [0x7ffff6d6d830]
  [1:17] _start+0x29  [0x5ed0a9]
Aborted (core dumped)

#3 Updated by Sam White 9 days ago

Notes:

The weird part is that printing out the PE # inside ampiCreateMain shows the call is from PE 0, but inside TCharm_Get_num_chunks() it thinks it's on PE 1...

By removing the TCHARMAPI call from TCHARM_Get_num_chunks(), we get further into the startup process but then fail in TCHARM_Create_data(). Doing the same there, we get further again but fail still...

TCHARMAPI is defined at the bottom of src/libs/ck-libs/tcharm/tcharm_impl.h and is responsible for switching the TLS pointer when context switching between user-level threads. I think we may need to handle the case where TLS isn't initialized yet (it is initialized after the first couple calls into TCHARM_* routines). I'm not sure, but there's something wrong with TCHARMAPI...

Note that all AMPI routines also call AMPI_API() macro to do the same thing (this macro does the same thing as TCHARMAPI, so the problem could be in AMPI too.

This is the furthest I was able to get things:

Thread 3 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffebfff700 (LWP 7589)]
0x00000000006027fa in CthBaseResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:761
761      CpvAccess(_numSwitches)++;
(gdb) bt
#0  0x00000000006027fa in CthBaseResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:761
#1  0x000000000060339c in CthResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:1782
#2  0x0000000000609e58 in CsdScheduleForever () at convcore.c:1901
#3  0x000000000060a0ed in CsdScheduler (maxmsgs=maxmsgs@entry=-1) at convcore.c:1837
#4  0x00000000006080aa in ConverseRunPE (everReturn=everReturn@entry=0) at machine-common-core.c:1531
#5  0x0000000000608115 in call_startfn (vindex=0x0) at machine-smp.c:414
#6  0x00007ffff79bf6ba in start_thread (arg=0x7fffebfff700) at pthread_create.c:333
#7  0x00007ffff696341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

#4 Updated by Evan Ramos 6 days ago

  • Status changed from New to In Progress

#5 Updated by Sam White 5 days ago

That fixes the failures on multicore-linux-x86_64. I believe mpi-linux-x86_64-smp failures were the same.

#6 Updated by Sam White 5 days ago

mpi-linux-x86_64-smp is still failing with the patch for './tlsglobals +vp2':

Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 3 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffebfff700 (LWP 15259)]
0x000000000060c05a in CthBaseResume (t=0x7fffe4159840) at libthreads-default-tls.c:762
762      CpvAccess(_numSwitches)++;
(gdb) bt
#0  0x000000000060c05a in CthBaseResume (t=0x7fffe4159840) at libthreads-default-tls.c:762
#1  0x000000000060ccbc in CthResume (t=0x7fffe4159840) at libthreads-default-tls.c:1720
#2  0x0000000000613b38 in CsdScheduleForever () at convcore.c:1902
#3  0x0000000000613dcd in CsdScheduler (maxmsgs=maxmsgs@entry=-1) at convcore.c:1838
#4  0x0000000000611cea in ConverseRunPE (everReturn=everReturn@entry=0) at machine-common-core.c:1535
#5  0x0000000000611d55 in call_startfn (vindex=0x0) at machine-smp.c:414
#6  0x00007ffff79bf6ba in start_thread (arg=0x7fffebfff700) at pthread_create.c:333
#7  0x00007ffff696341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

It even fails the same with +vp1.

#7 Updated by Evan Ramos 5 days ago

Removing the unused _numSwitches variable still results in a crash:

Starting program: /home/evan/charm/mpi-linux-x86_64-smp/tests/ampi/privatization/tlsglobals 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff41b0700 (LWP 18712)]
[New Thread 0x7ffff39af700 (LWP 18713)]
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
[New Thread 0x7fffe9098700 (LWP 18714)]
Converse/Charm++ Commit ID: v6.8.2-592-g45f0fd545
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 4 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9098700 (LWP 18714)]
0x00005555558f729a in CthFixData (t=0x7fffdc166c70) at libthreads-default-tls.c:350
350      size_t newsize = CthCpvAccess(CthDatasize);
(gdb) p CthCpvAccess(CthDatasize)
$1 = 56

I compiled with -g3 so that macro information would be included with debug symbols.

#8 Updated by Sam White 5 days ago

Yeah removing the stuff I pointed out on the patch doesn't fix this, but we could still remove it separately.

#9 Updated by Evan Ramos 5 days ago

My patch fixes the TCharm crash on netlrts-linux-x86_64-smp in addition to multicore, but that build also exhibits a new problem:

$ ./charmrun ++local +n2 ++ppn 1 ./tlsglobals +vp2
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 0.033 seconds.
Charm++> Running in SMP mode: numNodes 2,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-592-g45f0fd545
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: AMPI> The application provided a custom AMPI_Setup() method, but AMPI is built with shared library support. This is an unsupported configuration. Please recompile charm++/AMPI without `-build-shared` or remove the AMPI_Setup() function from your application.

[0] Stack Traceback:
  [0:0] CmiAbortHelper+0xc4  [0x55be65eb19a7]
  [0:1] CmiGetNonLocal+0  [0x55be65eb19e2]
  [0:2] _ZN17MPI_threadstart_t5startEv+0x75  [0x55be65d73de3]
  [0:3] AMPI_threadstart+0x37  [0x55be65d31c04]
  [0:4] +0x1f7c41  [0x55be65d1bc41]
  [0:5] CthStartThread+0x58  [0x55be65ead943]
  [0:6] make_fcontext+0x2f  [0x55be65eadddf]
Fatal error on PE 0> AMPI> The application provided a custom AMPI_Setup() method, but AMPI is built with shared library support. This is an unsupported configuration. Please recompile charm++/AMPI without `-build-shared` or remove the AMPI_Setup() function from your application.

If +n is set to 1, ++ppn is increased above 1, or +vp is set to 1, this problem does not occur.

#10 Updated by Evan Ramos 5 days ago

The netlrts issue seems to be that AMPI_Setup_Switch is executed on a process other than the one containing PE 0.

EDIT: Upon closer inspection, more than one process thinks it has/is PE 0.

#11 Updated by Evan Ramos 5 days ago

_numSwitches removal: https://charm.cs.illinois.edu/gerrit/4025
netlrts-linux-x86_64-smp workaround: https://charm.cs.illinois.edu/gerrit/4026

#12 Updated by Evan Ramos 5 days ago

$ valgrind -- ./tlsglobals
==1917== Memcheck, a memory error detector
==1917== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1917== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1917== Command: ./tlsglobals
==1917== 
==1917== Conditional jump or move depends on uninitialised value(s)
==1917==    at 0x66F2C36: opal_value_unload (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.20.10.1)
==1917==    by 0x580F3FA: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x5813598: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x5835B44: PMPI_Init_thread (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x4B35AC: LrtsInit (machine.c:1440)
==1917==    by 0x4B0FB9: ConverseInit (machine-common-core.c:1286)
==1917==    by 0x392A5C: main (main.C:9)
==1917== 
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-593-geeb7b8fca
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.056 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
==1917== Thread 4:
==1917== Invalid read of size 8
==1917==    at 0x4AB2A8: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==    by 0x4ADDFD: call_startfn (machine-smp.c:414)
==1917==    by 0x50457FB: start_thread (pthread_create.c:465)
==1917==    by 0x616FB5E: clone (clone.S:95)
==1917==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1917== 
==1917== Invalid read of size 4
==1917==    at 0x4C2B5C: CpdAborting (debug-conv.c:368)
==1917==    by 0x4B1A40: CmiAbortHelper (machine-common-core.c:1670)
==1917==    by 0x4B3458: KillOnAllSigs (machine.c:1360)
==1917==    by 0x505114F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.26.so)
==1917==    by 0x4AB2A7: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1917== 
==1917== 
==1917== Process terminating with default action of signal 11 (SIGSEGV)
==1917==  Access not within mapped region at address 0x0
==1917==    at 0x4C2B5C: CpdAborting (debug-conv.c:368)
==1917==    by 0x4B1A40: CmiAbortHelper (machine-common-core.c:1670)
==1917==    by 0x4B3458: KillOnAllSigs (machine.c:1360)
==1917==    by 0x505114F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.26.so)
==1917==    by 0x4AB2A7: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==  If you believe this happened as a result of a stack
==1917==  overflow in your program's main thread (unlikely but
==1917==  possible), you can try to increase the size of the
==1917==  main thread stack using the --main-stacksize= flag.
==1917==  The main thread stack size used in this run was 8388608.
==1917== 
==1917== HEAP SUMMARY:
==1917==     in use at exit: 4,705,476 bytes in 14,366 blocks
==1917==   total heap usage: 27,194 allocs, 12,828 frees, 12,906,096 bytes allocated
==1917== 
==1917== LEAK SUMMARY:
==1917==    definitely lost: 8,585 bytes in 71 blocks
==1917==    indirectly lost: 391 bytes in 13 blocks
==1917==      possibly lost: 6,088 bytes in 98 blocks
==1917==    still reachable: 4,690,412 bytes in 14,184 blocks
==1917==                       of which reachable via heuristic:
==1917==                         newarray           : 880,040 bytes in 2 blocks
==1917==         suppressed: 0 bytes in 0 blocks
==1917== Rerun with --leak-check=full to see details of leaked memory
==1917== 
==1917== For counts of detected and suppressed errors, rerun with: -v
==1917== Use --track-origins=yes to see where uninitialised values come from
==1917== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

#13 Updated by Sam White 5 days ago

Does running with "+isomalloc_sync" help, and/or with valgrind and "--leak-check=full --track-origins=yes"?

#14 Updated by Evan Ramos 5 days ago

Does running with "+isomalloc_sync" help

No difference

and/or with valgrind and "--leak-check=full --track-origins=yes"?

Nothing relevant to the issue

#15 Updated by Sam White 4 days ago

We should check if -tlsglobals ever worked on mpi-linux-x86_64-smp in any way

#16 Updated by Sam White 4 days ago

If the MPI library is itself using TLS (some MPI libraries use threads internally) will our ‘-tlsglobals’ runtime switching of the TLS pointer interfere with it? I’m not sure why it would be different for non-SMP though, if so

Also available in: Atom PDF