Project

General

Profile

Support #2041

charmrun with mpirun instead of srun?

Added by Geoffrey Lovelace 3 months ago. Updated 25 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
12/29/2018
Due date:
% Done:

0%


Description

Hi,

I need to use charm++ on a cluster that runs parallel jobs using mpirun. The cluster uses slurm to submit jobs, but for technical reasons slurm can’t be built to be aware of the mpi distribution we need to use, so srun does not correctly run jobs in parallel.

The trouble is charmrun always seems to try to use srun, even if I give it ++mpiexec as an option. But I can run jobs fine by hand by using mpirun directly in a script I pass to sbatch.

Is there a way to be sure charmrun uses mpirun? Either an option when calling charmrun or when compiling charm?

History

#1 Updated by Sam White 3 months ago

Can you try with this:

./charmrun ++mpiexec ++remote-shell "mpirun <mpirun_args>" ./pgm <application_args>

#2 Updated by Geoffrey Lovelace 3 months ago

I tried this, and it still called srun. Might I need to build charm++ differently? I used

./build charm++ mpi-linux-x86_64 mpicxx smp -j16 --with-production

#3 Updated by Geoffrey Lovelace 3 months ago

I’m using charm++ v6.8.

#4 Updated by Sam White 3 months ago

Is that v6.8.0 or v6.8.2? Can you try with v6.9.0? You shouldn't need to build Charm++ any differently.

#5 Updated by Geoffrey Lovelace 2 months ago

It's v6.8.0. We're working on adding support for v6.9.0 to our code base (https://github.com/sxs-collaboration/spectre), but that will take us a bit. Is it normal for the ++mpiexec option to not work in version 6.8.0?

#6 Updated by Sam White 2 months ago

No, it's not. What MPI library are you using? And could you post the output of what happens when you try the "++mpiexec ++remote-shell" command I posted above?

#7 Updated by Jim Phillips 2 months ago

For the mpi-... builds of Charm++ you can run the binary with mpirun/mpiexec/srun directly rather than via charmrun. In fact charmrun is just a script on these builds. The binary charmrun and ++mpiexec options are only needed for netlrts-, verbs-, and similar builds.

#8 Updated by Evan Ramos about 1 month ago

As Jim said, those ++mpiexec ++remote-shell options are only valid with the netlrts/verbs version of charmrun. For the MPI machine layer, mpirun was moved ahead of srun in charmrun's precedence for the 6.9.0 release. You may want to cherry-pick commit 8aaa484c022306918988aa3114e578b4a9748662.

#9 Updated by Eric Bohm 25 days ago

  • Assignee set to Evan Ramos

#10 Updated by Evan Ramos 25 days ago

  • Status changed from New to Feedback

Awaiting feedback from Geoffrey.

#11 Updated by Geoffrey Lovelace 25 days ago

Thank you for the feedback! Please accept my apologies for the delay, as I have been swamped with travel and with physically installing our new cluster (when I first posted, I was testing the cluster remotely before it was shipped).

Our particular application, spectre (http://github.com/sxs-collaboration/spectre) does not yet support Charm++ version 6.9, although some of our developers are working on adding this support. Once we move to v6.9, I plan to try this again to verify that charmrun works. Since I can run jobs with raw mpirun commands, for now this isn't a dealbreaker. Is there anything I can't do with a raw mpirun call that I can do with a charmrun call?

Meanwhile, I'd like to try a build of v6.8 cherry-picking the commit you suggest. But I probably will need a couple of weeks to be able to carve out the time to do this. Thank you very much for keeping up with this issue and for your patience with the delays in my replies! I appreciate your help.

Also available in: Atom PDF