Project

General

Profile

Bug #1922

Isomalloc fails with large memory footprints

Added by Sam White 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
AMPI
Target version:
-
Start date:
05/22/2018
Due date:
% Done:

0%

Tags:

Description

Isomalloc has not been stress tested for its efficiency for large memory footprint applications.
What is the memory overhead of using Isomalloc and its mempool implementation?
How close can we get to the memory limit of a node with and without Isomalloc?

I've seen XPACC's PlasComCM crash with out of memory errors when using Isomalloc before, but had no real way to debug this easily or to say how much extra memory Isomalloc was using.


Related issues

Related to Charm++ - Feature #1921: Make Isomalloc/mempool not use the pool for large allocations Implemented 05/22/2018

History

#1 Updated by Sam White 3 months ago

  • Tags set to isomalloc

#2 Updated by Sam White 3 months ago

From Yidong Xia and his postdoc at Idaho National Lab, who are currently running MiniMD on AMPI:

When I run miniMD_ampi for small system (in.lj.miniMD), there is not any problem. However, when I increased the system size from 36*36*36 to 360*360*360 (natoms ~ 130M), miniMD_ampi gave me the following error message:

$ ./obj_ampi/charmrun +p8 ./miniMD_ampi +vp8
Running command: ./miniMD_ampi +p8 +vp8
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode:  8 threads
Warning> Using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
Converse/Charm++ Commit ID: v6.8.2-682-ga8b455cfe
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (24-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
# Create System:
------------- Processor 4 Exiting: Called CmiAbort ------------
Reason: Mempool-requested slot is more than what mempool can provide as    one chunk, increase cutOffNum and cutoffPoints in mempool
Charm++ fatal error:
Mempool-requested slot is more than what mempool can provide as    one chunk, increase cutOffNum and cutoffPoints in mempool
Abort trap: 6
-----------------------------------------------------------------

miniMD without ampi indeed works for the large system (360*360*360).

Note that this is without migrating anything, just from having Isomalloc turned on. We need to find out if it's one really large allocation, a bunch of small ones, or something inbetween. Is Isomalloc internally fragmented and wasting memory? We need to develop better debug modes for Isomalloc too...

The version of MiniMD they are using is here: https://drive.google.com/file/d/1WZECeU3nH2j5o74UWA-pbnbmjC1wCsTg/view?usp=sharing

#3 Updated by Sam White 3 months ago

  • Tracker changed from Feature to Bug
  • Subject changed from Test Isomalloc with large memory footprints to Isomalloc fails with large memory footprints

#4 Updated by Eric Bohm 3 months ago

  • Assignee set to Evan Ramos

#5 Updated by Sam White 3 months ago

The problem here appears to be that the application tried to allocate a buffer of size >= 256 MB, which is the max size that the mempool supports. The mempool should allow be able to handle larger allocations for generality, but we probably don't want to actually pool such large allocations at all (related to issue #1921).

#6 Updated by Sam White 3 months ago

Related issue to make Isomalloc/mempool not use the pool for large allocations: https://charm.cs.illinois.edu/redmine/issues/1921

#7 Updated by Evan Ramos 3 months ago

  • Related to Feature #1921: Make Isomalloc/mempool not use the pool for large allocations added

Also available in: Atom PDF