Project

General

Profile

Bug #1858

tests/ampi/privatization fails on mpi-linux-x86_64-smp autobuild

Added by Sam White 6 months ago. Updated 6 months ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
Start date:
04/11/2018
Due date:
% Done:

0%


Description

Autobuild runs this test on the 'courtesy' lab machine:

$ ./build LIBS mpi-linux-x86_64 smp -j16 --with-production --enable-error-checking -g

$ cd mpi-linux-x86_64-smp/tests/ampi/privatization

$ make test OPTS=-g TESTOPTS=+isomalloc_sync

../../../bin/testrun  ./tlsglobals +p1 +vp2  +isomalloc_sync

Running on 1 processors:  ./tlsglobals +vp2 +isomalloc_sync
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 1  ./tlsglobals +vp2 +isomalloc_sync
Charm++> Running on MPI version: 2.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 3c99e0d
Charm++> Synchronizing isomalloc memory region...
Charm++> Consolidated Isomalloc memory region: 0x2ab700000000 - 0x6aa900000000 (67051520 MB).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0] CmiAbortHelper+0x5a  [0x61c20a]
  [1:1]   [0x61c28d]
  [1:2] TCHARM_Get_num_chunks+0x1fe  [0x4cd44e]
  [1:3] _Z14ampiCreateMainPFviPPcEPKci+0x16  [0x4e1bb6]
  [1:4] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x34  [0x543734]
  [1:5] _Z10_initCharmiPPc+0x2869  [0x54b7a9]
  [1:6]   [0x61f236]
  [1:7]   [0x61f3c8]
  [1:8] +0x8184  [0x2aaaaaeda184]
  [1:9] clone+0x6d  [0x2aaaabfa903d]

mpi-linux-x86_64-smp-tsan-tlsglobals.log View (60.5 KB) Evan Ramos, 04/24/2018 05:42 PM

mpitsan.c View (181 Bytes) Evan Ramos, 04/24/2018 06:04 PM

mpitsan.log View (4.99 KB) Evan Ramos, 04/24/2018 06:04 PM

History

#1 Updated by Sam White 6 months ago

Also hanging on multicore-linux-x86_64

#2 Updated by Evan Ramos 6 months ago

Can also happen with netlrts-linux-x86_64-smp:

$ ./tlsglobals +vp2 +p2
Charm++: standalone mode (not using charmrun)
Charm++> Running in SMP mode: numNodes 1,  2 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-545-gaa6ff5636
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0] CmiAbortHelper+0xbe  [0x7822f1]
  [1:1] CmiGetNonLocal+0  [0x78232a]
  [1:2] TCHARM_Get_num_chunks+0x3c  [0x5efb0b]
  [1:3] _Z14ampiCreateMainPFviPPcEPKci+0x19  [0x603cfc]
  [1:4] AMPI_Setup_Switch+0x37  [0x6035b9]
  [1:5] TCHARM_Call_fallback_setup+0x19  [0x5efac1]
  [1:6] TCHARM_User_setup+0x9  [0x8120fd]
  [1:7] tcharm_user_setup_+0x9  [0x8120ea]
  [1:8] _ZN10TCharmMainC2EP8CkArgMsg+0x3f  [0x670731]
  [1:9] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x40  [0x67045e]
  [1:10] _Z10_initCharmiPPc+0x1f59  [0x6797c0]
  [1:11]   [0x78205a]
  [1:12] ConverseInit+0x73e  [0x781d18]
  [1:13] main+0x3f  [0x670a6c]
  [1:14] __libc_start_main+0xf0  [0x7ffff6d6d830]
  [1:15] _start+0x29  [0x5ed0a9]
Charm++ fatal error:
TCHARM_Get_num_chunks should only be called on PE 0 during setup!
[1] Stack Traceback:
  [1:0]   [0x7830e5]
  [1:1] LrtsAbort+0x6f  [0x782a8a]
  [1:2] CmiAbort+0  [0x7822fd]
  [1:3] CmiGetNonLocal+0  [0x78232a]
  [1:4] TCHARM_Get_num_chunks+0x3c  [0x5efb0b]
  [1:5] _Z14ampiCreateMainPFviPPcEPKci+0x19  [0x603cfc]
  [1:6] AMPI_Setup_Switch+0x37  [0x6035b9]
  [1:7] TCHARM_Call_fallback_setup+0x19  [0x5efac1]
  [1:8] TCHARM_User_setup+0x9  [0x8120fd]
  [1:9] tcharm_user_setup_+0x9  [0x8120ea]
  [1:10] _ZN10TCharmMainC2EP8CkArgMsg+0x3f  [0x670731]
  [1:11] _ZN18CkIndex_TCharmMain25_call_TCharmMain_CkArgMsgEPvS0_+0x40  [0x67045e]
  [1:12] _Z10_initCharmiPPc+0x1f59  [0x6797c0]
  [1:13]   [0x78205a]
  [1:14] ConverseInit+0x73e  [0x781d18]
  [1:15] main+0x3f  [0x670a6c]
  [1:16] __libc_start_main+0xf0  [0x7ffff6d6d830]
  [1:17] _start+0x29  [0x5ed0a9]
Aborted (core dumped)

#3 Updated by Sam White 6 months ago

Notes:

The weird part is that printing out the PE # inside ampiCreateMain shows the call is from PE 0, but inside TCharm_Get_num_chunks() it thinks it's on PE 1...

By removing the TCHARMAPI call from TCHARM_Get_num_chunks(), we get further into the startup process but then fail in TCHARM_Create_data(). Doing the same there, we get further again but fail still...

TCHARMAPI is defined at the bottom of src/libs/ck-libs/tcharm/tcharm_impl.h and is responsible for switching the TLS pointer when context switching between user-level threads. I think we may need to handle the case where TLS isn't initialized yet (it is initialized after the first couple calls into TCHARM_* routines). I'm not sure, but there's something wrong with TCHARMAPI...

Note that all AMPI routines also call AMPI_API() macro to do the same thing (this macro does the same thing as TCHARMAPI, so the problem could be in AMPI too.

This is the furthest I was able to get things:

Thread 3 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffebfff700 (LWP 7589)]
0x00000000006027fa in CthBaseResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:761
761      CpvAccess(_numSwitches)++;
(gdb) bt
#0  0x00000000006027fa in CthBaseResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:761
#1  0x000000000060339c in CthResume (t=0x7fffe4159fb0) at libthreads-default-tls.c:1782
#2  0x0000000000609e58 in CsdScheduleForever () at convcore.c:1901
#3  0x000000000060a0ed in CsdScheduler (maxmsgs=maxmsgs@entry=-1) at convcore.c:1837
#4  0x00000000006080aa in ConverseRunPE (everReturn=everReturn@entry=0) at machine-common-core.c:1531
#5  0x0000000000608115 in call_startfn (vindex=0x0) at machine-smp.c:414
#6  0x00007ffff79bf6ba in start_thread (arg=0x7fffebfff700) at pthread_create.c:333
#7  0x00007ffff696341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

#4 Updated by Evan Ramos 6 months ago

  • Status changed from New to In Progress

#5 Updated by Sam White 6 months ago

That fixes the failures on multicore-linux-x86_64. I believe mpi-linux-x86_64-smp failures were the same.

#6 Updated by Sam White 6 months ago

mpi-linux-x86_64-smp is still failing with the patch for './tlsglobals +vp2':

Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 3 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffebfff700 (LWP 15259)]
0x000000000060c05a in CthBaseResume (t=0x7fffe4159840) at libthreads-default-tls.c:762
762      CpvAccess(_numSwitches)++;
(gdb) bt
#0  0x000000000060c05a in CthBaseResume (t=0x7fffe4159840) at libthreads-default-tls.c:762
#1  0x000000000060ccbc in CthResume (t=0x7fffe4159840) at libthreads-default-tls.c:1720
#2  0x0000000000613b38 in CsdScheduleForever () at convcore.c:1902
#3  0x0000000000613dcd in CsdScheduler (maxmsgs=maxmsgs@entry=-1) at convcore.c:1838
#4  0x0000000000611cea in ConverseRunPE (everReturn=everReturn@entry=0) at machine-common-core.c:1535
#5  0x0000000000611d55 in call_startfn (vindex=0x0) at machine-smp.c:414
#6  0x00007ffff79bf6ba in start_thread (arg=0x7fffebfff700) at pthread_create.c:333
#7  0x00007ffff696341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

It even fails the same with +vp1.

#7 Updated by Evan Ramos 6 months ago

Removing the unused _numSwitches variable still results in a crash:

Starting program: /home/evan/charm/mpi-linux-x86_64-smp/tests/ampi/privatization/tlsglobals 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff41b0700 (LWP 18712)]
[New Thread 0x7ffff39af700 (LWP 18713)]
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
[New Thread 0x7fffe9098700 (LWP 18714)]
Converse/Charm++ Commit ID: v6.8.2-592-g45f0fd545
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 4 "tlsglobals" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9098700 (LWP 18714)]
0x00005555558f729a in CthFixData (t=0x7fffdc166c70) at libthreads-default-tls.c:350
350      size_t newsize = CthCpvAccess(CthDatasize);
(gdb) p CthCpvAccess(CthDatasize)
$1 = 56

I compiled with -g3 so that macro information would be included with debug symbols.

#8 Updated by Sam White 6 months ago

Yeah removing the stuff I pointed out on the patch doesn't fix this, but we could still remove it separately.

#9 Updated by Evan Ramos 6 months ago

My patch fixes the TCharm crash on netlrts-linux-x86_64-smp in addition to multicore, but that build also exhibits a new problem:

$ ./charmrun ++local +n2 ++ppn 1 ./tlsglobals +vp2
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 0.033 seconds.
Charm++> Running in SMP mode: numNodes 2,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-592-g45f0fd545
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: AMPI> The application provided a custom AMPI_Setup() method, but AMPI is built with shared library support. This is an unsupported configuration. Please recompile charm++/AMPI without `-build-shared` or remove the AMPI_Setup() function from your application.

[0] Stack Traceback:
  [0:0] CmiAbortHelper+0xc4  [0x55be65eb19a7]
  [0:1] CmiGetNonLocal+0  [0x55be65eb19e2]
  [0:2] _ZN17MPI_threadstart_t5startEv+0x75  [0x55be65d73de3]
  [0:3] AMPI_threadstart+0x37  [0x55be65d31c04]
  [0:4] +0x1f7c41  [0x55be65d1bc41]
  [0:5] CthStartThread+0x58  [0x55be65ead943]
  [0:6] make_fcontext+0x2f  [0x55be65eadddf]
Fatal error on PE 0> AMPI> The application provided a custom AMPI_Setup() method, but AMPI is built with shared library support. This is an unsupported configuration. Please recompile charm++/AMPI without `-build-shared` or remove the AMPI_Setup() function from your application.

If +n is set to 1, ++ppn is increased above 1, or +vp is set to 1, this problem does not occur.

#10 Updated by Evan Ramos 6 months ago

The netlrts issue seems to be that AMPI_Setup_Switch is executed on a process other than the one containing PE 0.

EDIT: Upon closer inspection, more than one process thinks it has/is PE 0.

#11 Updated by Evan Ramos 6 months ago

_numSwitches removal: https://charm.cs.illinois.edu/gerrit/4025
netlrts-linux-x86_64-smp workaround: https://charm.cs.illinois.edu/gerrit/4026

#12 Updated by Evan Ramos 6 months ago

$ valgrind -- ./tlsglobals
==1917== Memcheck, a memory error detector
==1917== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1917== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1917== Command: ./tlsglobals
==1917== 
==1917== Conditional jump or move depends on uninitialised value(s)
==1917==    at 0x66F2C36: opal_value_unload (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.20.10.1)
==1917==    by 0x580F3FA: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x5813598: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x5835B44: PMPI_Init_thread (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==1917==    by 0x4B35AC: LrtsInit (machine.c:1440)
==1917==    by 0x4B0FB9: ConverseInit (machine-common-core.c:1286)
==1917==    by 0x392A5C: main (main.C:9)
==1917== 
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-593-geeb7b8fca
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.056 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
==1917== Thread 4:
==1917== Invalid read of size 8
==1917==    at 0x4AB2A8: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==    by 0x4ADDFD: call_startfn (machine-smp.c:414)
==1917==    by 0x50457FB: start_thread (pthread_create.c:465)
==1917==    by 0x616FB5E: clone (clone.S:95)
==1917==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1917== 
==1917== Invalid read of size 4
==1917==    at 0x4C2B5C: CpdAborting (debug-conv.c:368)
==1917==    by 0x4B1A40: CmiAbortHelper (machine-common-core.c:1670)
==1917==    by 0x4B3458: KillOnAllSigs (machine.c:1360)
==1917==    by 0x505114F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.26.so)
==1917==    by 0x4AB2A7: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1917== 
==1917== 
==1917== Process terminating with default action of signal 11 (SIGSEGV)
==1917==  Access not within mapped region at address 0x0
==1917==    at 0x4C2B5C: CpdAborting (debug-conv.c:368)
==1917==    by 0x4B1A40: CmiAbortHelper (machine-common-core.c:1670)
==1917==    by 0x4B3458: KillOnAllSigs (machine.c:1360)
==1917==    by 0x505114F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.26.so)
==1917==    by 0x4AB2A7: CthFixData (libthreads-default-tls.c:350)
==1917==    by 0x4AC398: CthBaseResume (libthreads-default-tls.c:758)
==1917==    by 0x4ACABC: CthResume (libthreads-default-tls.c:1715)
==1917==    by 0x4B6FC6: CthResumeNormalThread (convcore.c:2110)
==1917==    by 0x4B672D: CmiHandleMessage (convcore.c:1654)
==1917==    by 0x4B6B29: CsdScheduleForever (convcore.c:1902)
==1917==    by 0x4B6A44: CsdScheduler (convcore.c:1838)
==1917==    by 0x4B1812: ConverseRunPE (machine-common-core.c:1535)
==1917==  If you believe this happened as a result of a stack
==1917==  overflow in your program's main thread (unlikely but
==1917==  possible), you can try to increase the size of the
==1917==  main thread stack using the --main-stacksize= flag.
==1917==  The main thread stack size used in this run was 8388608.
==1917== 
==1917== HEAP SUMMARY:
==1917==     in use at exit: 4,705,476 bytes in 14,366 blocks
==1917==   total heap usage: 27,194 allocs, 12,828 frees, 12,906,096 bytes allocated
==1917== 
==1917== LEAK SUMMARY:
==1917==    definitely lost: 8,585 bytes in 71 blocks
==1917==    indirectly lost: 391 bytes in 13 blocks
==1917==      possibly lost: 6,088 bytes in 98 blocks
==1917==    still reachable: 4,690,412 bytes in 14,184 blocks
==1917==                       of which reachable via heuristic:
==1917==                         newarray           : 880,040 bytes in 2 blocks
==1917==         suppressed: 0 bytes in 0 blocks
==1917== Rerun with --leak-check=full to see details of leaked memory
==1917== 
==1917== For counts of detected and suppressed errors, rerun with: -v
==1917== Use --track-origins=yes to see where uninitialised values come from
==1917== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

#13 Updated by Sam White 6 months ago

Does running with "+isomalloc_sync" help, and/or with valgrind and "--leak-check=full --track-origins=yes"?

#14 Updated by Evan Ramos 6 months ago

Does running with "+isomalloc_sync" help

No difference

and/or with valgrind and "--leak-check=full --track-origins=yes"?

Nothing relevant to the issue

#15 Updated by Sam White 6 months ago

We should check if -tlsglobals ever worked on mpi-linux-x86_64-smp in any way

#16 Updated by Sam White 6 months ago

If the MPI library is itself using TLS (some MPI libraries use threads internally) will our ‘-tlsglobals’ runtime switching of the TLS pointer interfere with it? I’m not sure why it would be different for non-SMP though, if so

#17 Updated by Sam White 6 months ago

Evan cleaned up some of the valgrind output here: https://charm.cs.illinois.edu/gerrit/#/c/charm/+/4046/

#18 Updated by Evan Ramos 6 months ago

I ran the tlsglobals test with TSan. There is a lot of output that seems to have the same few root causes but I'm not sure where to go with it.

#19 Updated by Sam White 6 months ago

Hmm, yeah that's pretty opaque. Most of it seems to be from inside "mca_" which is a low-level component of OpenMPI. Could you run tsan on an MPI (not AMPI) program that just calls MPI_Init() and MPI_Finalize() to compare?

#20 Updated by Evan Ramos 6 months ago

This log seems to be the beginning portion of the tlsglobals log, before any Converse code is run.

#21 Updated by Sam White 6 months ago

Here's the AMPI-related things in the mpi-linux-x86_64-smp tsan log you posted above:

==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 4 at 0x55d6c100702c by thread T3:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:957 (tlsglobals+0x000000228db4)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1534 (tlsglobals+0x00000048e058)
    #4 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #5 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 4 at 0x55d6c100702c by main thread:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:957 (tlsglobals+0x000000228ddd)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1528 (tlsglobals+0x00000048dffe)
    #4 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #5 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'Csv_CtvOffsampiPtr_' of size 4 at 0x55d6c100702c (tlsglobals+0x0000008a702c)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:957 in ampiProcInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 4 at 0x55d6c1007030 by thread T3:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:958 (tlsglobals+0x000000228e1d)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1534 (tlsglobals+0x00000048e058)
    #4 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #5 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 4 at 0x55d6c1007030 by main thread:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:958 (tlsglobals+0x000000228e46)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1528 (tlsglobals+0x00000048dffe)
    #4 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #5 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'Csv_CtvOffsampiInitDone_' of size 4 at 0x55d6c1007030 (tlsglobals+0x0000008a7030)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:958 in ampiProcInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 4 at 0x55d6c1007038 by thread T3:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:959 (tlsglobals+0x000000228e86)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1534 (tlsglobals+0x00000048e058)
    #4 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #5 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 4 at 0x55d6c1007038 by main thread:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:959 (tlsglobals+0x000000228eaf)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1528 (tlsglobals+0x00000048dffe)
    #4 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #5 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'Csv_CtvOffsampiFinalized_' of size 4 at 0x55d6c1007038 (tlsglobals+0x0000008a7038)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:959 in ampiProcInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 4 at 0x55d6c1007034 by thread T3:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:960 (tlsglobals+0x000000228eef)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1534 (tlsglobals+0x00000048e058)
    #4 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #5 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 4 at 0x55d6c1007034 by main thread:
    #0 ampiProcInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:960 (tlsglobals+0x000000228f18)
    #1 InitCallTable::enumerateInitCalls() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1037 (tlsglobals+0x0000002d8618)
    #2 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1317 (tlsglobals+0x0000002dbae7)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1528 (tlsglobals+0x00000048dffe)
    #4 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #5 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'Csv_CtvOffsstackBottom_' of size 4 at 0x55d6c1007034 (tlsglobals+0x0000008a7034)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libs/ck-libs/ampi/ampi.C:960 in ampiProcInit

Those 4 variables are Ctv variables that are currently initialized in AMPI's [initproc] routine in ampi.C:

 static void ampiProcInit(void){
   CtvInitialize(ampiParent*, ampiPtr);
   CtvInitialize(bool,ampiInitDone);
   CtvInitialize(bool,ampiFinalized);
   CtvInitialize(void*,stackBottom);

I'm not sure where else to initialize them though...

#22 Updated by Sam White 6 months ago

And here's the rest of the Charm++/Converse-related tsan output. I think we should really fix all of these, but the ''CmiThreadIs_flag' one is the most relevant to -tlsglobals:

WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 4 at 0x55d6c101c624 by main thread:
    #0 CthBaseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libthreads-default-tls.c:569 (tlsglobals+0x000000484420)
    #1 CthInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libthreads-default-tls.c:1645 (tlsglobals+0x000000485643)
    #2 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1487 (tlsglobals+0x00000048dd75)
    #3 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #4 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
  Previous write of size 4 at 0x55d6c101c624 by thread T3:
    #0 CthBaseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libthreads-default-tls.c:569 (tlsglobals+0x000000484437)
    #1 CthInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libthreads-default-tls.c:1645 (tlsglobals+0x000000485643)
    #2 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1487 (tlsglobals+0x00000048dd75)
    #3 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #4 <null> <null> (libtsan.so.0+0x000000025aab)

  Location is global 'CmiThreadIs_flag' of size 4 at 0x55d6c101c624 (tlsglobals+0x0000008bc624)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/libthreads-default-tls.c:569 in CthBaseInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Write of size 4 at 0x55d6c101d4d0 by main thread (mutexes: write M327):
    #0 CmiArgInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:359 (tlsglobals+0x000000492da9)
    #1 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3781 (tlsglobals+0x00000049fa68)
    #2 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #3 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #4 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
  Previous read of size 4 at 0x55d6c101d4d0 by thread T3:
    #0 CmiAddCLA /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:298 (tlsglobals+0x000000492880)
    #1 CmiGetArgFlagDesc /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:549 (tlsglobals+0x000000493932)
    #2 CmiTimerInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:1013 (tlsglobals+0x000000495d41)
    #3 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3796 (tlsglobals+0x00000049fc76)
    #4 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #5 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #6 <null> <null> (libtsan.so.0+0x000000025aab)

  Location is global 'usageChecked' of size 4 at 0x55d6c101d4d0 (tlsglobals+0x0000008bd4d0)

  Mutex M327 (0x7b0c0000b850) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x0000000299de)
    #1 LrtsCreateLock /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1910 (tlsglobals+0x00000048ea73)
    #2 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:455 (tlsglobals+0x000000487c15)
    #3 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #4 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:359 in CmiArgInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Read of size 8 at 0x55d6c1008c48 by thread T3:
    #0 CmiWallTimer /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:1068 (tlsglobals+0x000000496111)
    #1 CcdModuleInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/conv-conds.c:430 (tlsglobals+0x0000004a2b3e)
    #2 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3800 (tlsglobals+0x00000049fc93)
    #3 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #4 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #5 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 8 at 0x55d6c1008c48 by main thread:
    #0 CmiWallTimer /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:1071 (tlsglobals+0x000000496163)
    #1 initTraceCore /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/traceCoreCommon.C:44 (tlsglobals+0x0000004b882a)
    #2 traceInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/trace-common.C:499 (tlsglobals+0x0000002cebd2)
    #3 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3824 (tlsglobals+0x00000049fde5)
    #4 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #5 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #6 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'lastT' of size 8 at 0x55d6c1008c48 (tlsglobals+0x0000008a8c48)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:1068 in CmiWallTimer
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Write of size 4 at 0x55d6c101e18c by thread T3:
    #0 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3842 (tlsglobals+0x00000049fe22)
    #1 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #2 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #3 <null> <null> (libtsan.so.0+0x000000025aab)
  Previous write of size 4 at 0x55d6c101e18c by main thread:
    #0 ConverseCommonInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3842 (tlsglobals+0x00000049fe22)
    #1 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1497 (tlsglobals+0x00000048df50)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)

  Location is global 'ccsRunning' of size 4 at 0x55d6c101e18c (tlsglobals+0x0000008be18c)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/convcore.c:3842 in ConverseCommonInit
==================
==================
WARNING: ThreadSanitizer: data race (pid=19329)
  Write of size 4 at 0x55d6c101b480 by main thread:
    #0 _loadbalancerInit() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/LBDatabase.C:186 (tlsglobals+0x0000003e9209)
    #1 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1192 (tlsglobals+0x0000002db6f4)
    #2 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1528 (tlsglobals+0x00000048dffe)
    #3 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1430 (tlsglobals+0x00000048d932)
    #4 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
  Previous write of size 4 at 0x55d6c101b480 by thread T3:
    #0 _loadbalancerInit() /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/LBDatabase.C:186 (tlsglobals+0x0000003e9209)
    #1 _initCharm(int, char**) /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/init.C:1192 (tlsglobals+0x0000002db6f4)
    #2 ConverseRunPE /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1534 (tlsglobals+0x00000048e058)
    #3 call_startfn /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:414 (tlsglobals+0x000000487b41)
    #4 <null> <null> (libtsan.so.0+0x000000025aab)

  Location is global '_lb_args' of size 104 at 0x55d6c101b460 (tlsglobals+0x0000008bb480)

  Thread T3 (tid=19339, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000290c3)
    #1 CmiStartThreads /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-smp.c:506 (tlsglobals+0x000000487e4f)
    #2 ConverseInit /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/machine-common-core.c:1428 (tlsglobals+0x00000048d928)
    #3 main /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/main.C:9 (tlsglobals+0x0000002cb286)
SUMMARY: ThreadSanitizer: data race /home/evan/charm/mpi-linux-x86_64-smp-tsan/tmp/LBDatabase.C:186 in _loadbalancerInit()

There are a bunch more races on _lb_args inside _loadbalancerInit() too.

#23 Updated by Sam White 6 months ago

CmiThreadIs_flag looks to be a static global variable, which all ranks read and write to without any sort of synchronization. It should be written to the same value by all threads, so I think it's actually safe but tsan can't tell that?

#24 Updated by Evan Ramos 6 months ago

I tried running with LLDB instead of GDB, which provided useful information:

$ lldb -- ./tlsglobals
(lldb) target create "./tlsglobals" 
Current executable set to './tlsglobals' (x86_64).
(lldb) r
Process 24193 launched: './tlsglobals' (x86_64)
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-620-gc5f37d228
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
Process 24193 stopped
* thread #4, name = 'tlsglobals', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: tlsglobals`CthFixData(t=0x00007fffdc166dd0) at libthreads-default-tls.c:350
   347     #endif
   348     static void CthFixData(CthThread t)
   349     {
-> 350       size_t newsize = CthCpvAccess(CthDatasize);
   351       size_t oldsize = B(t)->datasize;
   352       if (oldsize < newsize) {
   353         newsize = 2*newsize;
(lldb) p CthCpvAccess(CthDatasize)
error: Couldn't materialize: couldn't get the value of variable Cpv_CthDatasize_: No TLS data currently exists for this thread.
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression

#25 Updated by Sam White 6 months ago

Note the verbs autobuilds failed last night in tests/ampi/privatization too.

#26 Updated by Evan Ramos 6 months ago

They failed in swapglobals, which means charmc must not have detected a problem scenario in the toolchain.

#27 Updated by Evan Ramos 6 months ago

I suspect the issue has something to do with the MPI build using more threads than other targets, and also the use of pthread-specific thread local storage.

mpi-linux-x86_64-smp:

(lldb) thread list
Process 24193 stopped
  thread #1: tid = 24193, 0x00007fffe9cebbcf mca_pml_ob1.so`mca_pml_ob1_recv_req_start + 1263, name = 'tlsglobals'
  thread #2: tid = 24201, 0x00007ffff68de951 libc.so.6`__GI___poll(fds=0x00007fffec000b10, nfds=2, timeout=3599733) at poll.c:29, name = 'tlsglobals'
  thread #3: tid = 24202, 0x00007ffff68eacd7 libc.so.6`__GI_epoll_pwait(epfd=12, events=0x0000555555d32720, maxevents=32, timeout=-1, set=0x0000000000000000) at epoll_pwait.c:42, name = 'tlsglobals'
* thread #4: tid = 24203, 0x00005555558f6876 tlsglobals`CthFixData(t=0x00007fffdc166dd0) at libthreads-default-tls.c:350, name = 'tlsglobals', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)

netlrts-linux-x86_64-smp

(lldb) thread list
Process 9223 stopped
* thread #1: tid = 9223, 0x000055555574a8c6 tlsglobals`::test_privatization(rank=0, my_wth=0, global=0x00000004012ff170) at test.C:51, name = 'tlsglobals', stop reason = breakpoint 1.1
  thread #2: tid = 9226, 0x00005555558dd59c tlsglobals`PCQueuePop(Q=0x0000555555c9ed50) at pcqueue.h:226, name = 'tlsglobals'

#28 Updated by Evan Ramos 6 months ago

Interesting comment here:

$ lldb -- ./tlsglobals
(lldb) target create "./tlsglobals" 
Current executable set to './tlsglobals' (x86_64).
(lldb) r 
Process 1460 launched: './tlsglobals' (x86_64)
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-621-g9b61967fa
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
Charm++> -tlsglobals enabled for privatization of thread-local variables.
Process 1460 stopped
* thread #4, name = 'tlsglobals', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: tlsglobals`CthFixData(t=0x00007fffdc167b00) at libthreads-default-tls.c:350
   347     #endif
   348     static void CthFixData(CthThread t)
   349     {
-> 350       size_t newsize = CthCpvAccess(CthDatasize);
   351       size_t oldsize = B(t)->datasize;
   352       if (oldsize < newsize) {
   353         newsize = 2*newsize;
(lldb) bt
* thread #4, name = 'tlsglobals', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: tlsglobals`CthFixData(t=0x00007fffdc167b00) at libthreads-default-tls.c:350
    frame #1: tlsglobals`CthBaseResume(t=0x00007fffdc167b00) at libthreads-default-tls.c:775
    frame #2: tlsglobals`CthResume(t=0x00007fffdc167b00) at libthreads-default-tls.c:1732
    frame #3: tlsglobals`CthResumeNormalThread(token=0x00007fffdc167c00) at convcore.c:2110
    frame #4: tlsglobals`CmiHandleMessage(msg=0x00007fffdc167c00) at convcore.c:1654
    frame #5: tlsglobals`CsdScheduleForever at convcore.c:1902
    frame #6: tlsglobals`CsdScheduler(maxmsgs=-1) at convcore.c:1838
    frame #7: tlsglobals`ConverseRunPE(everReturn=0) at machine-common-core.c:1535
    frame #8: tlsglobals`call_startfn(vindex=0x0000000000000000) at machine-smp.c:414
    frame #9: 0x00007ffff79bb7fc libpthread.so.0`start_thread + 220
    frame #10: libc.so.6`__GI___clone at clone.S:95
(lldb) f 1
frame #1: tlsglobals`CthBaseResume(t=0x00007fffdc167b00) at libthreads-default-tls.c:775
   772       for(l=B(t)->listener;l!=NULL;l=l->next){
   773         if (l->resume) l->resume(l);
   774       }
-> 775       CthFixData(t); /*Thread-local storage may have changed in other thread.*/
   776       CthCpvAccess(CthCurrent) = t;
   777       CthCpvAccess(CthData) = B(t)->data;
   778       CthAliasEnable(B(t));

#29 Updated by Evan Ramos 6 months ago

The failure occurs on the first CthResume that shows up in any backtrace from switchTLS.

mpi-linux-x86_64-smp:

$ gdb ./tlsglobals 
GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./tlsglobals...done.
(gdb) br switchTLS
Breakpoint 1 at 0x3d073f: file cmitls.c, line 110.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>bt
>c
>end
(gdb) r
Starting program: /home/evan/charm/mpi-linux-x86_64-smp/tests/ampi/privatization/tlsglobals 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff41b0700 (LWP 24317)]
[New Thread 0x7ffff39af700 (LWP 24318)]
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
[New Thread 0x7fffe9098700 (LWP 24319)]
Converse/Charm++ Commit ID: v6.8.2-621-g9b61967fa
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.
[Switching to Thread 0x7fffe9098700 (LWP 24319)]

Thread 4 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffe9097348, next=0x7fffdc001660) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffe9097348, next=0x7fffdc001660) at cmitls.c:110
#1  0x00005555558f6e78 in CtgInstallMainThreadTLS (cur=0x7fffe9097348) at libthreads-default-tls.c:528
#2  0x000055555575fe0d in TCharmAPIRoutine::TCharmAPIRoutine (this=0x7fffe9097340, routineName=0x5555559929ee "TCHARM_Get_num_chunks", libraryName=0x55555599298d "tcharm") at tcharm_impl.h:337
#3  0x000055555575a1c7 in TCHARM_Get_num_chunks () at tcharm.C:677
#4  0x000055555576e770 in ampiCreateMain (mainFn=0x55555576de40 <AMPI_Fallback_Main(int, char**)>, name=0x555555995a18 "default", nameLen=7) at ampi.C:1057
#5  0x000055555576dec2 in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555575a194 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555991f5b in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555991f48 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557dd6c5 in TCharmMain::TCharmMain (this=0x7fffdc1658e0, msg=0x7fffdc08b580) at tcharmmain.C:43
#10 0x00005555557dd3a4 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x7fffdc08b580, impl_obj_void=0x7fffdc1658e0) at tcharmmain.def.h:53
#11 0x00005555557e7c7c in _initCharm (unused_argc=1, argv=0x7fffffffe468) at init.C:1538
#12 0x00005555558fce2f in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558f942e in call_startfn (vindex=0x0) at machine-smp.c:414
#14 0x00007ffff79bb7fc in start_thread (arg=0x7fffe9098700) at pthread_create.c:465
#15 0x00007ffff68eab5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffe90972f0, next=0x7fffe9097348) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffe90972f0, next=0x7fffe9097348) at cmitls.c:110
#1  0x00005555558f6eb1 in CtgInstallCthTLS (cur=0x7fffe90972f0, next=0x7fffe9097348) at libthreads-default-tls.c:534
#2  0x000055555575febc in TCharmAPIRoutine::~TCharmAPIRoutine (this=0x7fffe9097340, __in_chrg=<optimized out>) at tcharm_impl.h:368
#3  0x000055555575a244 in TCHARM_Get_num_chunks () at tcharm.C:677
#4  0x000055555576e770 in ampiCreateMain (mainFn=0x55555576de40 <AMPI_Fallback_Main(int, char**)>, name=0x555555995a18 "default", nameLen=7) at ampi.C:1057
#5  0x000055555576dec2 in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555575a194 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555991f5b in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555991f48 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557dd6c5 in TCharmMain::TCharmMain (this=0x7fffdc1658e0, msg=0x7fffdc08b580) at tcharmmain.C:43
#10 0x00005555557dd3a4 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x7fffdc08b580, impl_obj_void=0x7fffdc1658e0) at tcharmmain.def.h:53
#11 0x00005555557e7c7c in _initCharm (unused_argc=1, argv=0x7fffffffe468) at init.C:1538
#12 0x00005555558fce2f in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558f942e in call_startfn (vindex=0x0) at machine-smp.c:414
#14 0x00007ffff79bb7fc in start_thread (arg=0x7fffe9098700) at pthread_create.c:465
#15 0x00007ffff68eab5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffe9097328, next=0x7fffdc001660) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffe9097328, next=0x7fffdc001660) at cmitls.c:110
#1  0x00005555558f6e78 in CtgInstallMainThreadTLS (cur=0x7fffe9097328) at libthreads-default-tls.c:528
#2  0x000055555575fe0d in TCharmAPIRoutine::TCharmAPIRoutine (this=0x7fffe9097320, routineName=0x555555992a9f "TCHARM_Create_data", libraryName=0x55555599298d "tcharm") at tcharm_impl.h:337
#3  0x000055555575a3eb in TCHARM_Create_data (nThreads=1, threadFn=1, threadData=0x7fffdc165910, threadDataLen=8) at tcharm.C:735
#4  0x000055555576e7d4 in ampiCreateMain (mainFn=0x55555576de40 <AMPI_Fallback_Main(int, char**)>, name=0x555555995a18 "default", nameLen=7) at ampi.C:1061
#5  0x000055555576dec2 in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555575a194 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555991f5b in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555991f48 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557dd6c5 in TCharmMain::TCharmMain (this=0x7fffdc1658e0, msg=0x7fffdc08b580) at tcharmmain.C:43
#10 0x00005555557dd3a4 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x7fffdc08b580, impl_obj_void=0x7fffdc1658e0) at tcharmmain.def.h:53
#11 0x00005555557e7c7c in _initCharm (unused_argc=1, argv=0x7fffffffe468) at init.C:1538
#12 0x00005555558fce2f in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558f942e in call_startfn (vindex=0x0) at machine-smp.c:414
#14 0x00007ffff79bb7fc in start_thread (arg=0x7fffe9098700) at pthread_create.c:465
#15 0x00007ffff68eab5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 4 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffe90972c0, next=0x7fffe9097328) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffe90972c0, next=0x7fffe9097328) at cmitls.c:110
#1  0x00005555558f6eb1 in CtgInstallCthTLS (cur=0x7fffe90972c0, next=0x7fffe9097328) at libthreads-default-tls.c:534
#2  0x000055555575febc in TCharmAPIRoutine::~TCharmAPIRoutine (this=0x7fffe9097320, __in_chrg=<optimized out>) at tcharm_impl.h:368
#3  0x000055555575a496 in TCHARM_Create_data (nThreads=1, threadFn=1, threadData=0x7fffdc165910, threadDataLen=8) at tcharm.C:735
#4  0x000055555576e7d4 in ampiCreateMain (mainFn=0x55555576de40 <AMPI_Fallback_Main(int, char**)>, name=0x555555995a18 "default", nameLen=7) at ampi.C:1061
#5  0x000055555576dec2 in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555575a194 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555991f5b in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555991f48 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557dd6c5 in TCharmMain::TCharmMain (this=0x7fffdc1658e0, msg=0x7fffdc08b580) at tcharmmain.C:43
#10 0x00005555557dd3a4 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x7fffdc08b580, impl_obj_void=0x7fffdc1658e0) at tcharmmain.def.h:53
#11 0x00005555557e7c7c in _initCharm (unused_argc=1, argv=0x7fffffffe468) at init.C:1538
#12 0x00005555558fce2f in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558f942e in call_startfn (vindex=0x0) at machine-smp.c:414
#14 0x00007ffff79bb7fc in start_thread (arg=0x7fffe9098700) at pthread_create.c:465
#15 0x00007ffff68eab5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffdc0016f8, next=0x7fffdc166f48) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffdc0016f8, next=0x7fffdc166f48) at cmitls.c:110
#1  0x00005555558f80df in CthResume (t=0x7fffdc166ed0) at libthreads-default-tls.c:1728
#2  0x00005555559025f7 in CthResumeNormalThread (token=0x7fffdc166fd0) at convcore.c:2110
#3  0x0000555555901d5e in CmiHandleMessage (msg=0x7fffdc166fd0) at convcore.c:1654
#4  0x000055555590215a in CsdScheduleForever () at convcore.c:1902
#5  0x0000555555902075 in CsdScheduler (maxmsgs=-1) at convcore.c:1838
#6  0x00005555558fce43 in ConverseRunPE (everReturn=0) at machine-common-core.c:1535
#7  0x00005555558f942e in call_startfn (vindex=0x0) at machine-smp.c:414
#8  0x00007ffff79bb7fc in start_thread (arg=0x7fffe9098700) at pthread_create.c:465
#9  0x00007ffff68eab5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 "tlsglobals" received signal SIGSEGV, Segmentation fault.
0x00005555558f6876 in CthFixData (t=0x7fffdc166ed0) at libthreads-default-tls.c:350
350      size_t newsize = CthCpvAccess(CthDatasize);
(gdb) 

For comparison, netlrts-linux-x86_64-smp:

$ gdb ./tlsglobals
GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./tlsglobals...done.
(gdb) br switchTLS
Breakpoint 1 at 0x3b1bea: file cmitls.c, line 110.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>bt
>c
>end
(gdb) r
Starting program: /home/evan/charm/netlrts-linux-x86_64-smp/tests/ampi/privatization/tlsglobals 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Charm++> No provisioning arguments specified. Running with a single PE.
         Use +auto-provision to fully subscribe resources or +p1 to silence this message.
Charm++: standalone mode (not using charmrun)
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
[New Thread 0x7ffff6ce0700 (LWP 21742)]
Converse/Charm++ Commit ID: v6.8.2-621-g9b61967fa
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> RotateLB created.

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffffffddf8, next=0x555555c9c130) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffffffddf8, next=0x555555c9c130) at cmitls.c:110
#1  0x00005555558dbe80 in CtgInstallMainThreadTLS (cur=0x7fffffffddf8) at libthreads-default-tls.c:528
#2  0x0000555555752bd7 in TCharmAPIRoutine::TCharmAPIRoutine (this=0x7fffffffddf0, routineName=0x555555973aae "TCHARM_Get_num_chunks", libraryName=0x555555973a4d "tcharm") at tcharm_impl.h:337
#3  0x000055555574d195 in TCHARM_Get_num_chunks () at tcharm.C:677
#4  0x00005555557612d2 in ampiCreateMain (mainFn=0x555555760ae4 <AMPI_Fallback_Main(int, char**)>, name=0x555555976a90 "default", nameLen=7) at ampi.C:1057
#5  0x0000555555760b5d in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555574d162 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555972ff3 in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555972fe0 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557cdffb in TCharmMain::TCharmMain (this=0x555555d2a430, msg=0x555555d29d90) at tcharmmain.C:43
#10 0x00005555557cdd04 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x555555d29d90, impl_obj_void=0x555555d2a430) at tcharmmain.def.h:53
#11 0x00005555557d72e2 in _initCharm (unused_argc=1, argv=0x7fffffffe478) at init.C:1538
#12 0x00005555558e0d28 in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558e09e2 in ConverseInit (argc=1, argv=0x7fffffffe478, fn=0x5555557d52ff <_initCharm(int, char**)>, usched=0, initret=0) at machine-common-core.c:1430
#14 0x00005555557ce34e in main (argc=1, argv=0x7fffffffe478) at main.C:9

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffffffdda0, next=0x7fffffffddf8) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffffffdda0, next=0x7fffffffddf8) at cmitls.c:110
#1  0x00005555558dbeb9 in CtgInstallCthTLS (cur=0x7fffffffdda0, next=0x7fffffffddf8) at libthreads-default-tls.c:534
#2  0x0000555555752c7b in TCharmAPIRoutine::~TCharmAPIRoutine (this=0x7fffffffddf0, __in_chrg=<optimized out>) at tcharm_impl.h:368
#3  0x000055555574d212 in TCHARM_Get_num_chunks () at tcharm.C:677
#4  0x00005555557612d2 in ampiCreateMain (mainFn=0x555555760ae4 <AMPI_Fallback_Main(int, char**)>, name=0x555555976a90 "default", nameLen=7) at ampi.C:1057
#5  0x0000555555760b5d in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555574d162 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555972ff3 in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555972fe0 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557cdffb in TCharmMain::TCharmMain (this=0x555555d2a430, msg=0x555555d29d90) at tcharmmain.C:43
#10 0x00005555557cdd04 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x555555d29d90, impl_obj_void=0x555555d2a430) at tcharmmain.def.h:53
#11 0x00005555557d72e2 in _initCharm (unused_argc=1, argv=0x7fffffffe478) at init.C:1538
#12 0x00005555558e0d28 in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558e09e2 in ConverseInit (argc=1, argv=0x7fffffffe478, fn=0x5555557d52ff <_initCharm(int, char**)>, usched=0, initret=0) at machine-common-core.c:1430
#14 0x00005555557ce34e in main (argc=1, argv=0x7fffffffe478) at main.C:9

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffffffddd8, next=0x555555c9c130) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffffffddd8, next=0x555555c9c130) at cmitls.c:110
#1  0x00005555558dbe80 in CtgInstallMainThreadTLS (cur=0x7fffffffddd8) at libthreads-default-tls.c:528
#2  0x0000555555752bd7 in TCharmAPIRoutine::TCharmAPIRoutine (this=0x7fffffffddd0, routineName=0x555555973b5f "TCHARM_Create_data", libraryName=0x555555973a4d "tcharm") at tcharm_impl.h:337
#3  0x000055555574d3b3 in TCHARM_Create_data (nThreads=1, threadFn=1, threadData=0x555555e03dd0, threadDataLen=8) at tcharm.C:735
#4  0x0000555555761336 in ampiCreateMain (mainFn=0x555555760ae4 <AMPI_Fallback_Main(int, char**)>, name=0x555555976a90 "default", nameLen=7) at ampi.C:1061
#5  0x0000555555760b5d in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555574d162 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555972ff3 in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555972fe0 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557cdffb in TCharmMain::TCharmMain (this=0x555555d2a430, msg=0x555555d29d90) at tcharmmain.C:43
#10 0x00005555557cdd04 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x555555d29d90, impl_obj_void=0x555555d2a430) at tcharmmain.def.h:53
#11 0x00005555557d72e2 in _initCharm (unused_argc=1, argv=0x7fffffffe478) at init.C:1538
#12 0x00005555558e0d28 in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558e09e2 in ConverseInit (argc=1, argv=0x7fffffffe478, fn=0x5555557d52ff <_initCharm(int, char**)>, usched=0, initret=0) at machine-common-core.c:1430
#14 0x00005555557ce34e in main (argc=1, argv=0x7fffffffe478) at main.C:9
Charm++> -tlsglobals enabled for privatization of thread-local variables.

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x7fffffffdd70, next=0x7fffffffddd8) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x7fffffffdd70, next=0x7fffffffddd8) at cmitls.c:110
#1  0x00005555558dbeb9 in CtgInstallCthTLS (cur=0x7fffffffdd70, next=0x7fffffffddd8) at libthreads-default-tls.c:534
#2  0x0000555555752c7b in TCharmAPIRoutine::~TCharmAPIRoutine (this=0x7fffffffddd0, __in_chrg=<optimized out>) at tcharm_impl.h:368
#3  0x000055555574d45e in TCHARM_Create_data (nThreads=1, threadFn=1, threadData=0x555555e03dd0, threadDataLen=8) at tcharm.C:735
#4  0x0000555555761336 in ampiCreateMain (mainFn=0x555555760ae4 <AMPI_Fallback_Main(int, char**)>, name=0x555555976a90 "default", nameLen=7) at ampi.C:1061
#5  0x0000555555760b5d in AMPI_Setup_Switch () at ampi.C:841
#6  0x000055555574d162 in TCHARM_Call_fallback_setup () at tcharm.C:664
#7  0x0000555555972ff3 in TCHARM_User_setup () at compat_us.c:4
#8  0x0000555555972fe0 in tcharm_user_setup_ () at compat_fus.c:4
#9  0x00005555557cdffb in TCharmMain::TCharmMain (this=0x555555d2a430, msg=0x555555d29d90) at tcharmmain.C:43
#10 0x00005555557cdd04 in CkIndex_TCharmMain::_call_TCharmMain_CkArgMsg (impl_msg=0x555555d29d90, impl_obj_void=0x555555d2a430) at tcharmmain.def.h:53
#11 0x00005555557d72e2 in _initCharm (unused_argc=1, argv=0x7fffffffe478) at init.C:1538
#12 0x00005555558e0d28 in ConverseRunPE (everReturn=0) at machine-common-core.c:1534
#13 0x00005555558e09e2 in ConverseInit (argc=1, argv=0x7fffffffe478, fn=0x5555557d52ff <_initCharm(int, char**)>, usched=0, initret=0) at machine-common-core.c:1430
#14 0x00005555557ce34e in main (argc=1, argv=0x7fffffffe478) at main.C:9

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x555555c9ecc8, next=0x555555ca0cd8) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x555555c9ecc8, next=0x555555ca0cd8) at cmitls.c:110
#1  0x00005555558dcec1 in CthResume (t=0x555555ca0c60) at libthreads-default-tls.c:1728
#2  0x00005555558e904b in CthResumeNormalThread (token=0x555555e05270) at convcore.c:2110
#3  0x00005555558e88ba in CmiHandleMessage (msg=0x555555e05270) at convcore.c:1654
#4  0x00005555558e8c50 in CsdScheduleForever () at convcore.c:1902
#5  0x00005555558e8b96 in CsdScheduler (maxmsgs=-1) at convcore.c:1838
#6  0x00005555558e0d3c in ConverseRunPE (everReturn=0) at machine-common-core.c:1535
#7  0x00005555558e09e2 in ConverseInit (argc=1, argv=0x7fffffffe478, fn=0x5555557d52ff <_initCharm(int, char**)>, usched=0, initret=0) at machine-common-core.c:1430
#8  0x00005555557ce34e in main (argc=1, argv=0x7fffffffe478) at main.C:9

Thread 1 "tlsglobals" hit Breakpoint 1, switchTLS (cur=0x4010fff98, next=0x555555c9c130) at cmitls.c:110
110                    : "r"(next->memseg));
#0  switchTLS (cur=0x4010fff98, next=0x555555c9c130) at cmitls.c:110
#1  0x00005555558dbe80 in CtgInstallMainThreadTLS (cur=0x4010fff98) at libthreads-default-tls.c:528
#2  0x0000555555752bd7 in TCharmAPIRoutine::TCharmAPIRoutine (this=0x4010fff90, routineName=0x555555975390 "AMPI_Init", libraryName=0x5555559777c6 "ampi") at tcharm_impl.h:337
#3  0x000055555576b3cc in MPI_Init (p_argc=0x4010ffffc, p_argv=0x4010ffff0) at ampi.C:3618
#4  0x000055555574ab9b in AMPI_Main_cpp (argc=1, argv=0x555555d0c500) at test.C:200
#5  0x0000555555760b09 in AMPI_Fallback_Main (argc=1, argv=0x555555d0c500) at ampi.C:825
#6  0x00005555557a2e45 in MPI_threadstart_t::start (this=0x401100098) at ampi.C:1031
#7  0x00005555557612b6 in AMPI_threadstart (data=0x555555e04c10) at ampi.C:1051
#8  0x000055555574b661 in startTCharmThread (msg=0x555555e04bf0) at tcharm.C:175
#9  0x00005555558dcf8b in CthStartThread (arg=...) at libthreads-default-tls.c:1770
#10 0x00005555558dd41f in make_fcontext () at make_x86_64_sysv_elf_gas.S:70
#11 0x0000000000000000 in ?? ()

(and much more afterward)

#30 Updated by Evan Ramos 6 months ago

My current hypothesis is that the crash is related to the fact that the MPI machine layer uses the first thread as the comm thread, unlike other machine layers. This pushes the initial setup of our custom TLS functionality to thread 2, and I suspect somehow this prevents the normal procedure from completing successfully.

mpi-linux-x86_64-smp: (using MPICH, because OpenMPI uses additional threads internally)

(gdb) info threads
  Id   Target Id         Frame 
  1    Thread 0x7ffff7fd4dc0 (LWP 22660) "tlsglobals" 0x00007ffff6936951 in __GI___poll (fds=0x555555cb5080, nfds=1, timeout=0) at ../sysdeps/unix/sysv/linux/poll.c:29
* 2    Thread 0x7ffff5b74700 (LWP 22661) "tlsglobals" allocNewTLSSeg (t=0x7ffff0166fc8, th=0x7ffff0166f50) at cmitls.c:52

netlrts-linux-x86_64-smp:

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fd6d80 (LWP 22664) "tlsglobals" allocNewTLSSeg (t=0x555555ca0cd8, th=0x555555ca0c60) at cmitls.c:52
  2    Thread 0x7ffff6ce0700 (LWP 22668) "tlsglobals" PCQueuePop (Q=0x555555c9d110) at pcqueue.h:226

#31 Updated by Evan Ramos 6 months ago

I tried the following hack:

diff --git a/src/conv-core/threads.c b/src/conv-core/threads.c
index 8713a286c..af9b770ac 100644
--- a/src/conv-core/threads.c
+++ b/src/conv-core/threads.c
@@ -1691,6 +1691,16 @@ void CthInit(char **argv)
     CmiThreadIs_flag |= CMI_THREAD_IS_ALIAS;
 #endif
  }
+
+#if CMK_CONVERSE_MPI && CMK_THREADS_BUILD_TLS
+  {
+    tlsseg_t temp;
+    allocNewTLSSeg(&B(t)->tlsseg,t);
+    switchTLS(&temp, &B(t)->tlsseg);
+    CthCpvAccess(CthCurrent)=t; // crashes here, "No TLS data currently exists for this thread." 
+    switchTLS(&B(t)->tlsseg, &temp);
+  }
+#endif
 }

 static void CthThreadFree(CthThread t)

It runs on both threads and crashes on both.

#32 Updated by Sam White 6 months ago

My current hypothesis is that the crash is related to the fact that the MPI machine layer uses the first thread as the comm thread, unlike other machine layers.

Yeah, the way we call MPI_Init_thread (with MPI_THREAD_FUNNELED) necessitates that all calls we make to MPI must be done on the master thread of the process (thread 0), or else the MPI standard says the behavior is undefined.

#33 Updated by Evan Ramos 6 months ago

Switching the machine layer to use MPI_THREAD_SERIALIZED instead proves the hypothesis incorrect, as it still crashes.

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fd4dc0 (LWP 1386) "tlsglobals" 0x00005555558e8646 in CthFixData (t=0x555555cae100) at libthreads-default-tls.c:350
  2    Thread 0x7ffff5b74700 (LWP 1390) "tlsglobals" 0x00007ffff6936951 in __GI___poll (fds=0x555555cb5080, nfds=1, timeout=0) at ../sysdeps/unix/sysv/linux/poll.c:29
diff --git a/src/arch/mpi/machine.c b/src/arch/mpi/machine.c
index 4646e9c74..5264cb70b 100644
--- a/src/arch/mpi/machine.c
+++ b/src/arch/mpi/machine.c
@@ -1431,7 +1431,7 @@ void LrtsInit(int *argc, char ***argv, int *numNodes, int *myNodeID) {
 #if CMK_MPI_INIT_THREAD
 #if CMK_SMP
     if (Cmi_smp_mode_setting == COMM_THREAD_SEND_RECV)
-        thread_level = MPI_THREAD_FUNNELED;
+        thread_level = MPI_THREAD_SERIALIZED;
       else
         thread_level = MPI_THREAD_MULTIPLE;
 #else
diff --git a/src/arch/util/machine-smp.c b/src/arch/util/machine-smp.c
index 255561e58..8b229be7e 100644
--- a/src/arch/util/machine-smp.c
+++ b/src/arch/util/machine-smp.c
@@ -472,7 +472,7 @@ static void CmiStartThreads(char **argv)
   Cmi_state_vector = (CmiState *)calloc(_Cmi_mynodesize+1, sizeof(CmiState));
 #if CMK_CONVERSE_MPI
       /* main thread is communication thread */
-  if(!CharmLibInterOperate) {
+  if(!CharmLibInterOperate && _thread_provided == MPI_THREAD_FUNNELED) {
     CmiStateInit(_Cmi_mynode+CmiNumPes(), _Cmi_mynodesize, &Cmi_mystate);
     Cmi_state_vector[_Cmi_mynodesize] = &Cmi_mystate;
   } else 
@@ -491,7 +491,7 @@ static void CmiStartThreads(char **argv)
 #endif
   tocreate = _Cmi_mynodesize;
 #if CMK_CONVERSE_MPI
-  if(!CharmLibInterOperate) {
+  if(!CharmLibInterOperate && _thread_provided == MPI_THREAD_FUNNELED) {
     start = 0;
     end = tocreate - 1;                    /* skip comm thread */
   } else 
@@ -512,7 +512,7 @@ static void CmiStartThreads(char **argv)
   }
 #if ! (CMK_HAS_TLS_VARIABLES && !CMK_NOT_USE_TLS_THREAD)
 #if CMK_CONVERSE_MPI
-  if(!CharmLibInterOperate)
+  if(!CharmLibInterOperate && _thread_provided == MPI_THREAD_FUNNELED)
     pthread_setspecific(Cmi_state_key, Cmi_state_vector+_Cmi_mynodesize);
   else 
 #endif

#34 Updated by Sam White 6 months ago

Semi-related: I was just looking at -tlsglobals support in charmc, and it looks like we are adding "-Wl,--allow-multiple-definition" to the linker flags. Do we know if this is actually necessary? We sometimes have to add that flag to the linker flags when using Isomalloc, but other times it causes problems

#35 Updated by Evan Ramos 6 months ago

I was able to resolve the issue with the following change:

diff --git a/src/arch/mpi-linux-x86_64/conv-mach-smp.h b/src/arch/mpi-linux-x86_64/conv-mach-smp.h
index f72decc97..b1fd3ccae 100644
--- a/src/arch/mpi-linux-x86_64/conv-mach-smp.h
+++ b/src/arch/mpi-linux-x86_64/conv-mach-smp.h
@@ -10,3 +10,7 @@
 #undef CMK_TIMER_USE_SPECIAL
 #define CMK_TIMER_USE_GETRUSAGE                            1
 #define CMK_TIMER_USE_SPECIAL                              0
+
+#undef CMK_NOT_USE_TLS_THREAD
+#define CMK_NOT_USE_TLS_THREAD   1
+

I am unsure if this is a proper solution, since it is not needed for this purpose on any other platform. Would this change come at any performance cost?

#37 Updated by Evan Ramos 6 months ago

  • Assignee changed from Sam White to Evan Ramos
  • Status changed from In Progress to Implemented

It looks like non-SMP mpi-linux-x86_64 already defines CMK_NOT_USE_TLS_THREAD to 1. I'm not sure why this did not carry over to SMP builds, but at least I am more confident now in the change being okay.

Disregard this, I was looking at mpi-darwin-x86_64.

#38 Updated by Sam White 6 months ago

Can you try out an mpi-crayxc build too, to see if that should also set this? We might need all MPI builds to do it...

#39 Updated by Evan Ramos 6 months ago

Sam White wrote:

Can you try out an mpi-crayxc build too, to see if that should also set this? We might need all MPI builds to do it...

mpi-crayxc and mpi-crayxc-smp do not exhibit the issue.

#40 Updated by Sam White 6 months ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF