Project

General

Profile

Bug #1887

Custom array indices segfault in CkVec inside of LB framework

Added by Eric Mikida 8 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/25/2018
Due date:
% Done:

0%


Description

This bug was reported to us by James Bordner from the Enzo-P/Cello team. Depending on the order of bits in their custom array indices, their code could segfault during calls from the load balancing framework to CkVec, by passing an out of bounds index. Below is the email from James which contains a stack trace and instructions for reproduction.

Below is a stack trace from gdb. In frame 4 which triggers the assertion error, len 5664 and n1080033280. In frame 5, hash -3645, so index = (0+ -3645) % 20117 -3645.

If you want to try running the problem yourself, first try to get Enzo-P installed and running (see Getting started using Enzo-P). Then edit src/Cello/mesh_Index.hpp by moving the line "unsigned array : INDEX_BITS_ARRAY;" above the line "unsigned tree : INDEX_BITS_TREE;" and recompiling. The problem I'm running is "charmrun p2 bin/enzo-p input/load-balance-4.in +balancer RotateLB". I built Charm 6.8.2 using "./build charm+ netlrts-linux-x86_64".

Thanks!
James

#0 0x00000000007f75a7 in LrtsAbort (
message=0xa0cf40 "Assertion \"n<len\" failed in file cklists.h line 221.")
at machine.c:554
#1 0x00000000007f7036 in CmiAbortHelper (source=0xa34f61 "Called CmiAbort",
message=0xa0cf40 "Assertion \"n<len\" failed in file cklists.h line 221.",
suggestion=0x0, tellDebugger=1, framesToSkip=0)
at machine-common-core.c:1454
#2 0x00000000007f7066 in CmiAbort (
message=0xa0cf40 "Assertion \"n<len\" failed in file cklists.h line 221.")
at machine-common-core.c:1458
#3 0x0000000000800f1b in __cmi_assert (
errmsg=0xa0cf40 "Assertion \"n<len\" failed in file cklists.h line 221.")
at convcore.c:3820
#4 0x000000000069fc42 in CkVec<LDObjData>::operator[] (this=0x108d830,
n=1080033280) at cklists.h:221
#5 0x00000000007a1013 in BaseLB::LDStats::getHash (this=0x108d818, oid=...,
mid=...) at BaseLB.C:189
#6 0x00000000007a1101 in BaseLB::LDStats::getHash (this=0x108d818, objKey=...)
at BaseLB.C:202
#7 0x00000000007a1137 in BaseLB::LDStats::getSendHash (this=0x108d818,
cData=...) at BaseLB.C:208
#8 0x00000000007a929e in CentralLB::removeCommDataOfDeletedObjs (
this=0x111ca10, stats=0x108d818) at CentralLB.C:1464
---Type <return> to continue, or q <return> to quit---
#9 0x00000000007a6fb4 in CentralLB::LoadBalance (this=0x111ca10)
at CentralLB.C:713
#10 0x00000000007b04d2 in CkIndex_CentralLB::_call_LoadBalance_void (
impl_msg=0x3c86310, impl_obj_void=0x111ca10) at CentralLB.def.h:1848
#11 0x00000000006e1825 in CkDeliverMessageFree (epIdx=260, msg=0x3c86310,
obj=0x111ca10) at ck.C:597
#12 0x00000000006e1971 in _invokeEntryNoTrace (epIdx=260, env=0x3c862c0,
obj=0x111ca10) at ck.C:641
#13 0x00000000006e1a8d in _invokeEntry (epIdx=260, env=0x3c862c0,
obj=0x111ca10) at ck.C:652
#14 0x00000000006e31d1 in _deliverForBocMsg (ck=0x1019838, epIdx=260,
env=0x3c862c0, obj=0x111ca10) at ck.C:1083
#15 0x00000000006e32dc in _processForBocMsg (ck=0x1019838, env=0x3c862c0)
at ck.C:1101
#16 0x00000000006e387d in _processHandler (converseMsg=0x3c862c0, ck=0x1019838)
at ck.C:1263
#17 0x00000000007fdc93 in CmiHandleMessage (msg=0x3c862c0) at convcore.c:1619
#18 0x00000000007fdf17 in CsdScheduleForever () at convcore.c:1856
#19 0x00000000007fde48 in CsdScheduler (maxmsgs=-1) at convcore.c:1792
#20 0x00000000007f6e31 in ConverseRunPE (everReturn=0)
at machine-common-core.c:1298
#21 0x00000000007f6d38 in ConverseInit (argc=5, argv=0x7fffffffe988, fn=
0x6d5ba5 <_initCharm(int, char**)>, usched=0, initret=0)
at machine-common-core.c:1199
#22 0x00000000006d3ebd in main (argc=5, argv=0x7fffffffe988) at main.C:18

History

#1 Updated by Eric Mikida 8 months ago

Seems to have been narrowed down at least partially to an issue with custom array index documentation. Currently, the actual data for an array index as seen by the runtime (specifically for things like LB and tracing) is stored in CkArrayIndexBase::index and access via CkArrayIndex::data() (see file ckarrayindex.h). However for custom array indices and how we say to use them in the Charm++ manual, this underlying data is never set and only zero'd out by the default CkArrayIndex constructor. From what I can tell it's never set to anything related to the custom array index.

#2 Updated by Eric Mikida 8 months ago

Our example in examples/charm++/hello/fancyarray seems to be aware of this issue, and correctly allocates it's data using placement new. I'll talk to James and see if this fixes his issues, but the documentation should be updated to indicate this. The only issue is if the data for the custom array index exceeds the size of the index data in the base class.

#3 Updated by Sam White 8 months ago

Any update from James on if this fixed the issue?

#4 Updated by Eric Mikida 7 months ago

No. Haven't heard from him.

#5 Updated by Ronak Buch 7 months ago

  • Assignee changed from Ronak Buch to Eric Mikida

#6 Updated by Sam White 4 months ago

Please close this issue if it's not a real problem

Also available in: Atom PDF