Project

General

Profile

Bug #968

charm++ programs fail to run on BlueWaters due to craype-hugepages8M

Added by Michael Robson over 3 years ago. Updated over 2 years ago.

Status:
Upstream
Priority:
Low
Category:
-
Target version:
-
Start date:
02/04/2016
Due date:
% Done:

100%


Description

When running the tests/charm++/simplearrayhello program I would get the following output:

libhugetlbfslibhugetlbfslibhugetlbfslibhugetlbfs [nid11723:2677]libhugetlbfs: WARNING: Hugepage size 4096 unavailablelibhugetlbfslibhugetlbfs [nid11723:2678] [nid11723:2675] [nid11723:2676]libhugetlbfs: WARNING: Hugepage size 4096 unavailable [nid11723:2673]: WARNING: Hug
epage size 4096 unavailable: WARNING: Hugepage size 4096 unavailable: WARNING: Hugepage size 4096 unavailable [nid11723:2674] [nid11723:2680]: WARNING: Hugepage size 4096 unavailable [nid11723:2679]: WARNING: Hugepage size 4096 unavailable: WARNING: Hugepage size 4096 una
vailableCharm++> Running on Gemini (GNI) with 8 processes
Charm++> static SMSG
Charm++> SMSG memory: 39.5KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 4K
_pmiu_daemon(SIGCHLD): [NID 11723] [c20-4c0s5n1] [Fri Jan 29 15:44:48 2016] PE RANK 0 exit signal Segmentation fault

The solution was to swap out the craype-hugepages8M module for another one and then swap it back on.
The following make clean && make test seems to have fixed the issue.

This was also reported by Tim Haines while running ChaNGa on BlueWaters this past week. His email
----
I was getting the error " libhugetlbfs [nid16072:18061]: WARNING: Hugepage size 4096 unavailable" even when I was using the 8M hugepages. According to Robert Brunner, they removed the hugepages4M module last week. I have been able to compile several different versions of ChaNGa using CPU/GPU and SMP. I think it has been resolved, but I am waiting on the GPU jobs to start to make sure everything is working now.

He filed a report with BlueWaters to get this resolved.


Related issues

Related to Charm++ - Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts Implemented 10/06/2017

History

#1 Updated by Michael Robson over 3 years ago

After talking with Blue Waters support apparently the craype-hugepages8M module has been having problems.

Also, I had a similar problem on an old branch of charm and rebasing it fixed it.

#2 Updated by Tim Haines over 3 years ago

Robert Brunner at the BW support center suggested I try the following. None of them helped my situation, but they may be of use in the future.

1. Run module unload altd and module unload darshan, and then try re-building your code from scratch

2. Try building your code within an interactive job. That will use the compiler configuration in the service-node's environment, which is different from the h2ologin nodes.

My solution to the hugepages issue was to fast-forward the charm/nodeGPU branch to the latest charm head.

#3 Updated by Sam White over 2 years ago

Can this be closed?

#4 Updated by Phil Miller over 1 year ago

  • Related to Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts added

Also available in: Atom PDF