Project

General

Profile

Bug #1793

Isomalloc breaks in hwloc on multicore-darwin-x86_64

Added by Sam White over 1 year ago. Updated over 1 year ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
Start date:
02/06/2018
Due date:
% Done:

0%


Description

On a multicore-darwin-x86_64 build, in tests/ampi/megampi/. You need to remove "-Wl,--allow-multiple-definition" from the Isomalloc link line to get it to build.

$ lldb -- ./jacobi.iso 2 2 2 40 +vp8 +balancer RotateLB
Current executable set to './jacobi.iso' (x86_64).
(lldb) settings set -- target.run-args  "2" "2" "2" "40" "+vp8" "+balancer" "RotateLB" "+LBDebug" "1" "+p3"      l./jacobi.iso 2 2 2 40 +vp8 +balancer RotateLB +LB
(lldb) rg    jacobi           l./jacobi.iso 2 2 2 40 +vp8 +balancer RotateLB +LBDacobi.o
Process 82047 launched: './jacobi.iso' (x86_64) 2 40 +vp8 +balancer RotateLB +LBD
Process 82047 stopped
* thread #1: tid = 0x62e07, 0x00007fff91882f06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff91882f06 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff91882f06 <+10>: jae    0x7fff91882f10            ; <+20>
    0x7fff91882f08 <+12>: movq   %rax, %rdi
    0x7fff91882f0b <+15>: jmp    0x7fff9187d7cd            ; cerror_nocancel
    0x7fff91882f10 <+20>: retq   
(lldb) bt
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
* thread #1: tid = 0x62e07, 0x00007fff91882f06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff91882f06 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8466b4ec libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff92a6b6df libsystem_c.dylib`abort + 129
    frame #3: 0x00000001001ae22b jacobi.iso`mspace_free(msp=0x00000001002fb4a0, mem=<unavailable>) + 1531 at memory-gnu-internal.c:5069 [opt]
    frame #4: 0x00000001001afa20 jacobi.iso`mm_free(mem=<unavailable>) + 208 at memory-gnu.c:840 [opt]
    frame #5: 0x00000001001abc1c jacobi.iso`free [inlined] meta_free(mem=0x00000001007000a0) + 124 at memory-isomalloc.c:116 [opt]
    frame #6: 0x00000001001abbda jacobi.iso`free(mem=0x00000001007000a0) + 58 at libmemory-isomalloc.c:643 [opt]
    frame #7: 0x00000001001e74d9 jacobi.iso`hwloc__free_object_contents [inlined] cmi_hwloc__free_infos(infos=<unavailable>, count=<unavailable>) + 29 at topology.c:289 [opt]
    frame #8: 0x00000001001e74bc jacobi.iso`hwloc__free_object_contents(obj=<unavailable>) + 28 at topology.c:385 [opt]
    frame #9: 0x00000001001eaca9 jacobi.iso`hwloc_topology_clear_tree [inlined] cmi_hwloc_free_unlinked_object(obj=0x00000001005a4a90) + 8 at topology.c:404 [opt]
    frame #10: 0x00000001001eaca1 jacobi.iso`hwloc_topology_clear_tree(topology=<unavailable>, root=<unavailable>) + 49 at topology.c:2948 [opt]
    frame #11: 0x00000001001eabfc jacobi.iso`cmi_hwloc_topology_clear(topology=0x00000001005a41d0) + 28 at topology.c:2955 [opt]
    frame #12: 0x00000001001e91de jacobi.iso`cmi_hwloc_topology_destroy(topology=0x00000001005a41d0) + 30 at topology.c:2971 [opt]
    frame #13: 0x00000001001d1210 jacobi.iso`CmiInitHwlocTopology + 176 at cpuaffinity.c:59 [opt]
    frame #14: 0x00000001001b4f28 jacobi.iso`ConverseInit(argc=<unavailable>, argv=0x00007fff5fbff9b0, fn=(jacobi.iso`_initCharm(int, char**) at init.C:1070), usched=0, initret=0) + 104 at machine-common-core.c:1103 [opt]
    frame #15: 0x000000010009d83e jacobi.iso`main(argc=<unavailable>, argv=<unavailable>) + 46 at main.C:9 [opt]
    frame #16: 0x0000000100001064 jacobi.iso`start + 52

History

#1 Updated by Sam White over 1 year ago

I suspect the ordering of initialization between Converse memory modules and hwloc processs launch stuff was inverted by one of the recent hwloc changes.

#2 Updated by Sam White over 1 year ago

'-memory os-isomalloc' still works. charmc already changes isomalloc to os-isomalloc on Clang non-SMP, so I think we can just do that for all Clang builds? In the past I tried making all builds just use os-isomalloc, but uth-linux-x86_64 and cray builds failed on that.

#3 Updated by Sam White over 1 year ago

This is doesn't happen on multicore-linux-x86_64{-clang}. It does reproduce on multicore-darwin-x86_64-gfortran-gcc.

#4 Updated by Evan Ramos over 1 year ago

It sounds like we should always change isomalloc to os-isomalloc on Darwin.

#5 Updated by Sam White over 1 year ago

Yeah, I'll submit a patch to do that. I'm doing a git bisect now to see what broke it too.

#6 Updated by Sam White over 1 year ago

This is the offending commit: https://charm.cs.illinois.edu/gerrit/#/c/3144/

I'll see if there's an easy fix and otherwise post the workaround fix to use os-isomalloc.

#7 Updated by Evan Ramos over 1 year ago

Does this patch help? https://charm.cs.illinois.edu/gerrit/3606

EDIT: Probably not, looking again.

#8 Updated by Sam White over 1 year ago

It doesn't help

#9 Updated by Sam White over 1 year ago

  • Status changed from New to In Progress

#10 Updated by Sam White over 1 year ago

  • Assignee set to Sam White
  • Status changed from In Progress to Merged

Even though the underlying problem is still there, I'm going to mark this 'Merged' for practical purposes. We'd like to eventually get os-isomalloc to work everywhere, and there's already an open issue for that.

Also available in: Atom PDF