Project

General

Profile

Bug #1951

Isomalloc failure on netlrts-darwin-x86_64

Added by Sam White 12 months ago. Updated 12 months ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
Start date:
07/25/2018
Due date:
% Done:

0%

Tags:

Description

This only happens on a non-SMP darwin build with '-memory isomalloc', running './charmrun +p2 ./jacobi.iso 2 2 2 40 +vp8 +balancer RotateLB +LBDebug 1 ++local ++debug' gives the following:

Process 75789 stopped
* thread #1: tid = 0x2e1723, 0x00000001001bc6fe jacobi.iso`::mspace_malloc(msp=<unavailable>, bytes=<unavailable>) + 1294 at memory-gnu-internal.c:4949, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2005383e3)
    frame #0: 0x00000001001bc6fe jacobi.iso`::mspace_malloc(msp=<unavailable>, bytes=<unavailable>) + 1294 at memory-gnu-internal.c:4949 [opt]
   4946       if (rsize >= MIN_CHUNK_SIZE) { /* split dv */
   4947         mchunkptr r = ms->dv = chunk_plus_offset(p, nb);
   4948         ms->dvsize = rsize;
-> 4949         set_size_and_pinuse_of_free_chunk(r, rsize);
   4950         set_size_and_pinuse_of_inuse_chunk(ms, p, nb);
   4951       }
   4952       else { /* exhaust dv */
(lldb) bt
* thread #1: tid = 0x2e1723, 0x00000001001bc6fe jacobi.iso`::mspace_malloc(msp=<unavailable>, bytes=<unavailable>) + 1294 at memory-gnu-internal.c:4949, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2005383e3)
  * frame #0: 0x00000001001bc6fe jacobi.iso`::mspace_malloc(msp=<unavailable>, bytes=<unavailable>) + 1294 at memory-gnu-internal.c:4949 [opt]
    frame #1: 0x00000001001bf18e jacobi.iso`::mm_malloc(bytes=2040) + 126 at memory-gnu.c:805 [opt]
    frame #2: 0x00000001001c0460 jacobi.iso`::malloc(size_t) [inlined] meta_malloc(size=2032) + 74 at memory-isomalloc.c:105 [opt]
    frame #3: 0x00000001001c0416 jacobi.iso`::malloc(size=2032) + 86 at libmemory-isomalloc.C:726 [opt]
    frame #4: 0x00000001001f1eb4 jacobi.iso`cmi_hwloc_topology_init(topologyp=0x00007fff5fbff898) + 20 at topology.c:2765 [opt]
    frame #5: 0x00000001001dbe74 jacobi.iso`CmiInitHwlocTopology + 20 at cpuaffinity.c:43 [opt]
    frame #6: 0x00000001001c378c jacobi.iso`::ConverseInit(argc=<unavailable>, argv=0x00007fff5fbff960, fn=(jacobi.iso`_initCharm(int, char**) at init.C:1154), usched=0, initret=0) + 92 at machine-common-core.C:1204 [opt]
    frame #7: 0x00000001000b703e jacobi.iso`::charm_main(argc=<unavailable>, argv=<unavailable>) + 46 at init.C:1701 [opt]
    frame #8: 0x00000001000014a4 jacobi.iso`start + 52

History

#1 Updated by Evan Ramos 12 months ago

This is a weird one. It happens at the first call to meta_malloc, but a small number of meta_reallocs occur successfully beforehand, and mm_realloc calls mm_malloc internally. I'm seeing some of the same corruption / noise in internal ptmalloc3 data structures as the strdup issue, but it's weird because the crash happens during malloc, not free.

#2 Updated by Evan Ramos 12 months ago

./jacobi.iso 1 1 1 1 is sufficient to reproduce the crash.

#3 Updated by Evan Ramos 12 months ago

It gets stranger: I compiled with ASan and in that case execution successfully continues past the troublesome malloc call and eventually crashes at the end of getNodeTopoTreeEdges when the std::vector<> calls free internally from its destructor.

For comparison, I compiled the normally-working SMP build with ASan, and that crashed at the beginning of Chare::Chare() (ck.C:61) when trying to access the vtable.

#4 Updated by Evan Ramos 12 months ago

I did this:

diff --git a/src/conv-core/memory-gnu-internal.c b/src/conv-core/memory-gnu-internal.c
index 3d9fcbea8..bb00749e9 100644
--- a/src/conv-core/memory-gnu-internal.c
+++ b/src/conv-core/memory-gnu-internal.c
@@ -481,6 +481,8 @@ MAX_RELEASE_CHECK_RATE   default: 255 unless not HAVE_MMAP
   improvement at the expense of carrying around more memory.
 */

+#define DEBUG 1
+
 #ifndef WIN32
 #ifdef _WIN32
 #define WIN32 1
diff --git a/src/conv-core/memory-gnu.c b/src/conv-core/memory-gnu.c
index ac9b17d4c..4b8db7c76 100644
--- a/src/conv-core/memory-gnu.c
+++ b/src/conv-core/memory-gnu.c
@@ -112,7 +112,7 @@ the chunk to the user, if necessary.  */
 #endif

 #ifndef MALLOC_DEBUG
-# define MALLOC_DEBUG 0
+# define MALLOC_DEBUG 1
 #endif

 #define my_powerof2(x) ((((x)-1)&(x))==0)

and it resulted in Assertion failed: (mspace == arena_to_mspace(&main_arena)), function ptmalloc_init, file ./memory-gnu.c, line 746.

#5 Updated by Evan Ramos 12 months ago

  void *m2 = arena_to_mspace(&main_arena);
  assert(mspace == m2);
(lldb) p mspace
(void *) $0 = 0x000000010050b580
(lldb) p m2
(void *) $1 = 0x000000010050b578

#6 Updated by Evan Ramos 12 months ago

  • Status changed from New to Implemented

#7 Updated by Sam White 12 months ago

  • Status changed from Implemented to Merged
  • Target version set to 6.9.0

Also available in: Atom PDF