charm++ programs fail to run on BlueWaters due to craype-hugepages8M
When running the
tests/charm++/simplearrayhello program I would get the following output:
libhugetlbfslibhugetlbfslibhugetlbfslibhugetlbfs [nid11723:2677]libhugetlbfs: WARNING: Hugepage size 4096 unavailablelibhugetlbfslibhugetlbfs [nid11723:2678] [nid11723:2675] [nid11723:2676]libhugetlbfs: WARNING: Hugepage size 4096 unavailable [nid11723:2673]: WARNING: Hug epage size 4096 unavailable: WARNING: Hugepage size 4096 unavailable: WARNING: Hugepage size 4096 unavailable [nid11723:2674] [nid11723:2680]: WARNING: Hugepage size 4096 unavailable [nid11723:2679]: WARNING: Hugepage size 4096 unavailable: WARNING: Hugepage size 4096 una vailableCharm++> Running on Gemini (GNI) with 8 processes Charm++> static SMSG Charm++> SMSG memory: 39.5KB Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> only comm thread send/recv messages Charm++> Cray TLB page size: 4K _pmiu_daemon(SIGCHLD): [NID 11723] [c20-4c0s5n1] [Fri Jan 29 15:44:48 2016] PE RANK 0 exit signal Segmentation fault
The solution was to swap out the
craype-hugepages8M module for another one and then swap it back on.
make clean && make test seems to have fixed the issue.
This was also reported by Tim Haines while running ChaNGa on BlueWaters this past week. His email
I was getting the error " libhugetlbfs [nid16072:18061]: WARNING: Hugepage size 4096 unavailable" even when I was using the 8M hugepages. According to Robert Brunner, they removed the hugepages4M module last week. I have been able to compile several different versions of ChaNGa using CPU/GPU and SMP. I think it has been resolved, but I am waiting on the GPU jobs to start to make sure everything is working now.
He filed a report with BlueWaters to get this resolved.
#2 Updated by Tim Haines over 3 years ago
Robert Brunner at the BW support center suggested I try the following. None of them helped my situation, but they may be of use in the future.
1. Run module unload altd and module unload darshan, and then try re-building your code from scratch
2. Try building your code within an interactive job. That will use the compiler configuration in the service-node's environment, which is different from the h2ologin nodes.
My solution to the hugepages issue was to fast-forward the charm/nodeGPU branch to the latest charm head.