Project

General

Profile

Bug #796

TAU crashes on simple Charm++ program

Added by Michael Robson almost 4 years ago. Updated 3 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
Start date:
07/29/2015
Due date:
% Done:

0%


Description

From Antti-Pekka @ ORNL's email:

I compiled Charm++ (6.6.1) with TAU (2.24.0) on Titan using:

./build charm++ gemini_gni-crayxe-persistent-smp -optimize
./build Tau gemini_gni-crayxe-persistent-smp --tau-makefile=$TAU_MAKEFILE --no-build-shared -optimize

I then compile the "simplearrayhello" example from Charm++ (6.6.1) library using:
make OPTS='-tracemode Tau'

When I run the example with more than one thread it crashes:

--------------------------------------------------------------
hynninen@titan-login7:/lustre/atlas/scratch/hynninen/stf006> aprun -n1 -N1 -d16 ./hello +ppn 2
Charm++> Running on Gemini (GNI) with 1 processes
Charm++> static SMSG
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: numNodes 1,  2 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.6.1-0-g74a2cc5-namd-charm-6.6.1-build-2015-Mar-15-209687
Trace: traceroot: ./hello
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (16-way SMP).
Running Hello on 2 processors for 5 elements
Hello 0 created
Hello 1 created
Hello 2 created
Hello 3 created
Hello 4 created
Hi[17] from element 0
Hi[18] from element 1
[19071:0-0] TAU: Runtime overlap: found Hello::SayHi(int hiNo)::155 (0x2aab741ae790) on the stack, but stop called on Idle (0x100000a43b0)
Hi[19] from element 2
./hello() [0x201c126b]
./hello(Tau_stop_timer+0x196) [0x201c4796]
./hello(traceCommonEndIdle+0x57) [0x200c5237]
./hello(CcdRaiseCondition+0xf5) [0x20197b35]
./hello(CsdScheduleForever+0xf2) [0x20190ff2]
./hello(CsdScheduler+0x2d) [0x201912ed]
./hello() [0x2018f282]
./hello() [0x2018f705]
/lib64/libpthread.so.0(+0x7806) [0x2aaaaaeea806]
/lib64/libc.so.6(clone+0x6d) [0x2aaaafcb964d]
TAU: signal 6 on 0 - calling TAU_PROFILE_EXIT()...
TAU: done.
Application 8873155 exit codes: 1
Application 8873155 resources: utime ~0s, stime ~1s, Rss ~15900, inblocks ~11825, outblocks ~26705
--------------------------------------------------------------

The crash happens at program exit before trace/profile is written. Trace data is written for thread 0, but not for other threads. The example runs fine when using a single thread and when not profiling with TAU.

Similar behavior is observed with all other Charm++ programs I have tried (e.g. NAMD and self-written "hello world").

--
Antti-Pekka Hynninen
email: hynninena@ornl.gov
phone: 865-241-6123
Scientific Computing
Oak Ridge National Laboratory

History

#1 Updated by Ronak Buch over 3 years ago

  • Assignee changed from Ronak Buch to Seonmyeong Bak

#2 Updated by Eric Bohm almost 3 years ago

  • Priority changed from Normal to Low

#3 Updated by Sam White over 1 year ago

  • Assignee deleted (Seonmyeong Bak)

#4 Updated by Matthias Diener 3 months ago

  • Project changed from Projections to Charm++

Also available in: Atom PDF