Bug #1933

charm++/jacobi3d crashes for netlrts-linux-x86_64-syncft build during nightly build tests

Added by Nitin Bhat 9 days ago. Updated 7 days ago.

Target version:
Start date:
Due date:
% Done:



Nightly build output: http://ppl-jenkins:8080/job/Nightly-Build/label=xenial,platform=netlrts-linux-x86_64-syncft/1591/console

Start of iteration 470 at 33.343957
Start of iteration 480 at 34.175657
Start of iteration 490 at 34.706154
Charmrun> error on request socket to node 3 'localhost'--
Socket closed before recv.
Socket 5 failed 
charmrun says process 3 failed (on host localhost)
crashed_node 3 reconnected fd 5  
Charmrun finished launching new process in 1.409824s
Charmrun> continue node: 3
[3] Restarting after crash 
[3] I am restarting  cur_restart_phase:2 at time: 0.000850
[3] I am restarting  cur_restart_phase:2 discard charm message at time: 0.000879
[4] askProcDataHandler called with '3' cur_restart_phase:2 at time 36.434695.
[4] askProcDataHandler called with '3' cur_restart_phase:2 done at time 36.434971.
[3] ----- recoverProcDataHandler  cur_restart_phase:2 at time: 0.024316
[3] Assertion "type>=NewChareMsg && type<=ForArrayEltMsg" failed in file ./../include/envelope.h line 326.
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Assertion "type>=NewChareMsg && type<=ForArrayEltMsg" failed in file ./../include/envelope.h line 326.
[3] Stack Traceback:
  [3:0] _Z14CmiAbortHelperPKcS0_S0_ii+0x54  [0x572d94]
  [3:1]   [0x572dfb]
  [3:2]   [0x57e5d1]
  [3:3] _Z12CkPupMessageRN3PUP2erEPPvi+0x28a  [0x56b40a]
  [3:4] _ZN6CkMsgQI14CkReductionMsgE3pupERN3PUP2erE+0x91  [0x5123f1]
  [3:5] _ZN14CkReductionMgr3pupERN3PUP2erE+0x124  [0x50f184]
  [3:6] _ZN7CkArray3pupERN3PUP2erE+0x21  [0x4fcc31]
  [3:7]   [0x52bdb3]
  [3:8]   [0x53ed57]
  [3:9] CsdScheduleForever+0x50  [0x57d940]
  [3:10] CsdScheduler+0x2d  [0x57dc0d]
  [3:11] ConverseInit+0x7ca  [0x57ba4a]
  [3:12] charm_main+0x27  [0x4939e7]
  [3:13] __libc_start_main+0xf0  [0x7ffff718c830]
  [3:14] _start+0x29  [0x4893a9]
Makefile:21: recipe for target 'syncfttest' failed
make[3]: Leaving directory '/scratch/jenkins/builds/Nightly-Build/label=xenial,platform=netlrts-linux-x86_64-syncft@1591/charm/netlrts-linux-x86_64-syncft/tests/charm++/jacobi3d'
Fatal error on PE 3> Assertion "type>=NewChareMsg && type<=ForArrayEltMsg" failed in file ./../include/envelope.h line 326.
make[3]: *** [syncfttest] Error 1
Makefile:45: recipe for target 'syncfttest' failed


#2 Updated by Evan Ramos 8 days ago

  • Assignee set to Evan Ramos

#3 Updated by Evan Ramos 8 days ago

It looks like the reason this issue hasn't shown in autobuild is that kill_02.txt requests a kill after 35 seconds, but the program execution completes after about 18 seconds.

#5 Updated by Evan Ramos 8 days ago

(gdb) bt
#0  LrtsAbort (message=0x5eb018 "Assertion \"type>=NewChareMsg && type<=ForArrayEltMsg\" failed in file ./../include/envelope.h line 309.") at machine.c:557
#1  0x0000000000577c5a in envelope::alloc (groupDepNum=0, prio=0, size=<optimized out>, type=0 '\000') at ./../include/envelope.h:309
#2  _allocEnv (groupDepNum=<optimized out>, prio=<optimized out>, size=<optimized out>, msgtype=0) at ./../include/envelope.h:497
#3  CkPupMessage (p=..., atMsg=atMsg@entry=0x7fffffffe2b8, pack_mode=pack_mode@entry=1) at debug-message.C:56
#4  0x000000000051ada1 in CkMsgQ<CkReductionMsg>::pup (this=this@entry=0x9e5600, p=...) at charm++.h:180
#5  0x00000000005184e8 in operator| (v=..., p=...) at charm++.h:184
#6  CkReductionMgr::pup (this=this@entry=0x9e54f0, p=...) at ckreduction.C:1019
#7  0x00000000004ffc01 in CkArray::pup (this=0x9e54f0, p=...) at ckarray.C:924
#8  0x0000000000535a51 in CkPupPerPlaceData (p=..., idTable=<optimized out>, objectTable=0x8ed0c0, numObjects=<optimized out>, constructionMsgType=<optimized out>, creationFn=<optimized out>) at ckcheckpoint.C:512
#9  0x0000000000548997 in _handleProcData (p=...) at ckmemcheckpoint.C:659
#10 recoverProcDataHandler (msg=<optimized out>) at ckmemcheckpoint.C:1375
#11 0x0000000000586270 in CsdScheduleForever () at convcore.c:1894
#12 0x000000000058653d in CsdScheduler (maxmsgs=maxmsgs@entry=-1) at convcore.c:1830
#13 0x00000000005842aa in ConverseRunPE (everReturn=0) at machine-common-core.c:1527
#14 ConverseInit (argc=15, argv=0x7fffffffe8d8, fn=0x496440 <_initCharm(int, char**)>, usched=usched@entry=0, initret=initret@entry=0) at machine-common-core.c:1422
#15 0x000000000048ba97 in main (argc=<optimized out>, argv=<optimized out>) at main.C:9

#6 Updated by Evan Ramos 8 days ago

  • Status changed from New to Implemented

#7 Updated by Nitin Bhat 7 days ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF