Project

General

Profile

Bug #1819

Bug #1828: Infinite recursion inside malloc_info in CmiMemoryUsage

bigsim failing lb_test inside CmiMemoryUsage()

Added by Sam White 3 months ago. Updated 3 months ago.

Status:
Merged
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
03/06/2018
Due date:
% Done:

0%


Description

This test is pretty consistently failing in autobuild and Jenkins over the past week or so. The error is a segfault or a generic "Socket closed on recv" error message.

History

#1 Updated by Sam White 3 months ago

../../../../bin/testrun  +p4 ./lb_test 100 100 10 40 10 1000 ring +balancer CommLB +LBDebug 1 +x2 +y2 +z1 +cth1 +wth1  ++local ++no-va-randomization
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 0.012 seconds.
Charm++> Running in non-SMP mode: numPes 4
Converse/Charm++ Commit ID: 7068717
Charm++> scheduler running in netpoll mode.
BG info> Simulating 2x2x1 nodes with 1 comm + 1 work threads each.
BG info> Network type: bluegene.
alpha: 1.000000e-07    packetsize: 1024    CYCLE_TIME_FACTOR:1.000000e-03.
CYCLES_PER_HOP: 5    CYCLES_PER_CORNER: 75.
BG info> cpufactor is 1.000000.
BG info> floating point factor is 0.000000.
BG info> Using WallTimer for timing method. 
CharmLB> Verbose level 1, load balancing period: 0.02 seconds
CharmLB> Load balancer ignores processor background load.
CharmLB> Load balancer assumes all CPUs are same.
Trace: traceroot: ./lb_test
CharmLB> CommLB created.
Running lb_test on 4 processors with 100 elements
Print every 10 steps
Sync every 40 steps
First node busywaits 10 usec; last node busywaits 1000 usec

Selecting Topology Ring
Generating topology 0 for 100 elements
[0] Total work/step = 0.020908 sec
calibrated iterations 22302726
TIME PER STEP    10    0.394511    0.071291
TIME PER STEP    20    0.468282    0.073771
TIME PER STEP    30    0.546267    0.077985

CharmLB> CommLB: PE [0] step 0 starting at 0.599159 Memory: 0.847656 MB
Charmrun> error on request socket to node 0 '127.0.0.1'--
Socket closed before recv.
Makefile:38: recipe for target 'bgtest' failed

#2 Updated by Sam White 3 months ago

This test passes on netlrts-darwin-x86_64

#3 Updated by Sam White 3 months ago

  • Parent task set to #1828
  • Target version set to 6.9.0
  • Subject changed from bigsim failing in tests/charm++/load_balancing/lb_test to bigsim failing lb_test inside CmiMemoryUsage()

The issue here is actually infinite recursion in std::regex in CmiMemoryUsage(), duplicate of issue #1828.

To reproduce, do the following:

./build charm++ netlrts-linux-x86_64 bigsim -j16 -g" 
cd netlrts-linux-x86_64-bigsim/tests/charm++/load_balancing/lb_test/
make OPTS=-g
gdb --args ./lb_test 100 100 10 40 10 1000 ring +balancer CommLB +LBDebug 1 +x2 +y2 +z1 +cth1 +wth1

This is on the PPL lab machine "beauty", which has GCC v5.4.0.

#4 Updated by Sam White 3 months ago

  • Status changed from New to In Progress

#5 Updated by Sam White 3 months ago

We reverted the malloc_info patch for now

#6 Updated by Sam White 3 months ago

  • Status changed from In Progress to Merged

Also available in: Atom PDF