Project

General

Profile

Feature #1921

Make Isomalloc/mempool not use the pool for large allocations

Added by Sam White 30 days ago. Updated 17 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
-
Start date:
05/22/2018
Due date:
% Done:

0%

Tags:

Description

Currently Isomalloc always uses the mempool by default, and the mempool only tries to use pooled memory.
For large allocations (above some threshold), we should probably skip the pool and dynamically allocate them.
There is a patch that purports to do this already in gerrit here, but it needs clean up to make the threshold a build option and to actually test its correctness:
https://charm.cs.illinois.edu/gerrit/#/c/charm/+/1874/


Related issues

Related to Charm++ - Bug #1922: Isomalloc fails with large memory footprints New 05/22/2018

History

#1 Updated by Sam White 30 days ago

  • Tags set to isomalloc

#2 Updated by Sam White 29 days ago

  • Assignee set to Evan Ramos

#3 Updated by Evan Ramos 22 days ago

With my fixes to https://charm.cs.illinois.edu/gerrit/1874 I was able to complete a 256x256x256 MiniMD run, and a 360x360x360 run ended with this:

$ lldb -o r -- ./miniMD_ampi +vp8 +p8
(lldb) target create "./miniMD_ampi" 
Current executable set to './miniMD_ampi' (x86_64).
(lldb) settings set -- target.run-args  "+vp8" "+p8" 
(lldb) r
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 8 threads (PEs)
Warning> Using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
Converse/Charm++ Commit ID: v6.8.2-731-g907c4b094
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 6 cores x 2 PUs = 12-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
# Create System:
# Done .... 
# miniMD-Reference 2.0 (MPI+OpenMP) output ...
# Run Settings: 
    # MPI processes: 8
    # OpenMP threads: 1
    # Inputfile: in.lj.miniMD
    # Datafile: None
# Physics Settings: 
    # ForceStyle: LJ
    # Force Parameters: 1.00 1.00
    # Units: LJ
    # Atoms: 186624000
    # Atom types: 4
    # System size: 604.65 604.65 604.65 (unit cells: 360 360 360)
    # Density: 0.844200
    # Force cutoff: 2.500000
    # Timestep size: 0.005000
# Technical Settings: 
    # Neigh cutoff: 2.800000
    # Half neighborlists: 1
    # Neighbor bins: 300 300 300
    # Neighbor frequency: 20
    # Sorting frequency: 20
    # Thermo frequency: 100
    # Ghost Newton: 1
    # Use intrinsics: 0
    # Do safe exchange: 0
    # Size of float: 8

Not enough address space left on processor 3 to isomalloc 1296237920 bytes!
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Out of virtual address space for isomalloc
[3] Stack Traceback:
  [3:0] _Z14CmiAbortHelperPKcS0_S0_ii+0xc4  [0x5555559ee4de]
  [3:1] CmiGetNonLocal+0  [0x5555559ee519]
  [3:2] isomallocfn+0x7b  [0x555555a0dd4a]
  [3:3] mempool_large_malloc+0x42  [0x555555a26d50]
  [3:4] mempool_malloc+0x5d  [0x555555a269b7]
  [3:5] CmiIsomalloc+0x179  [0x555555a0f6b2]
  [3:6] CmiIsomallocBlockListMalloc+0x25  [0x555555a101fa]
  [3:7] +0x494783  [0x5555559e8783]
  [3:8] malloc+0x5b  [0x5555559e8ba4]
  [3:9] _ZN8Neighbor5buildER4Atom+0x74  [0x5555557d19c4]
  [3:10] _Z13AMPI_Main_cppiPPc+0xa13  [0x5555557ca4c3]
  [3:11] AMPI_Fallback_Main+0x25  [0x555555837645]
  [3:12] _ZN17MPI_threadstart_t5startEv+0x91  [0x55555587aa65]
  [3:13] AMPI_threadstart+0x37  [0x555555837df8]
  [3:14] +0x284af5  [0x5555557d8af5]
  [3:15] CthStartThread+0x58  [0x5555559eac67]
  [3:16] make_fcontext+0x2f  [0x5555559eb0df]
Charm++ fatal error:
Out of virtual address space for isomalloc
[3] Stack Traceback:
  [3:0] +0x49b4a9  [0x5555559ef4a9]
  [3:1] _Z9LrtsAbortPKc+0x71  [0x5555559eee8c]
  [3:2] CmiAbort+0  [0x5555559ee4ea]
  [3:3] CmiGetNonLocal+0  [0x5555559ee519]
  [3:4] isomallocfn+0x7b  [0x555555a0dd4a]
  [3:5] mempool_large_malloc+0x42  [0x555555a26d50]
  [3:6] mempool_malloc+0x5d  [0x555555a269b7]
  [3:7] CmiIsomalloc+0x179  [0x555555a0f6b2]
  [3:8] CmiIsomallocBlockListMalloc+0x25  [0x555555a101fa]
  [3:9] +0x494783  [0x5555559e8783]
  [3:10] malloc+0x5b  [0x5555559e8ba4]
  [3:11] _ZN8Neighbor5buildER4Atom+0x74  [0x5555557d19c4]
  [3:12] _Z13AMPI_Main_cppiPPc+0xa13  [0x5555557ca4c3]
  [3:13] AMPI_Fallback_Main+0x25  [0x555555837645]
  [3:14] _ZN17MPI_threadstart_t5startEv+0x91  [0x55555587aa65]
  [3:15] AMPI_threadstart+0x37  [0x555555837df8]
  [3:16] +0x284af5  [0x5555557d8af5]
  [3:17] CthStartThread+0x58  [0x5555559eac67]
  [3:18] make_fcontext+0x2f  [0x5555559eb0df]
Process 9557 stopped
* thread #2, name = 'miniMD_ampi', stop reason = signal SIGABRT
    frame #0: libc.so.6`__GI_raise(sig=2) at raise.c:51

#4 Updated by Evan Ramos 22 days ago

  • Related to Bug #1922: Isomalloc fails with large memory footprints added

#5 Updated by Sam White 22 days ago

Does a malloc of size 1296237920 bytes work on its own with Isomalloc (ignoring other allocations that MiniMD is doing)?

#6 Updated by Evan Ramos 22 days ago

Sam White wrote:

Does a malloc of size 1296237920 bytes work on its own with Isomalloc (ignoring other allocations that MiniMD is doing)?

Yes

#7 Updated by Sam White 22 days ago

K, we should verify that mallocs of size > INT_MAX bytes work correctly, if you haven't already, and then try to figure out more exactly why we're getting the "out of virtual memory" message. Are we actually exhausting that or is there some limitation in Isomalloc's implementation. When you run with +isomalloc_sync, I think the amount of virtual memory per rank is printed during startup.

#8 Updated by Evan Ramos 17 days ago

Sam White wrote:

K, we should verify that mallocs of size > INT_MAX bytes work correctly,

sizes > INT_MAX and > UINT_MAX verified to work, using https://charm.cs.illinois.edu/gerrit/1874

Also available in: Atom PDF