Replicas slower than separate jobs on GNI on Blue Waters
On Blue Waters, 200 nodes, 50 replicas (4 nodes per replica) non-smp running apoa1 is uniformly and significantly slower than a separate 4-node run.
No idea why. Topology-aware partitioning seems to be working OK. Testing MPI on Bridges to see if it happens there too.
What commit of charm are you using? We recently merged changes to make broadcasts and reductions topology-aware.
6.8.0 from Sept 5 (v6.8.0-0-ga36028e-namd-charm-6.8.0-build-2017-Sep-05-28093).
No observed performance difference on Bridges.
I see the exact same performance for v6.7.0-574-g7d61794-namd-charm-6.8.0-build-2017-Jan-23-80737 and v6.7.0-0-g46f867c-namd-charm-6.7.0-build-2015-Dec-21-45876.
Definitely not a recent change.
The bug does not affect the MPI layer on Blue Waters.
Still need to test verbs.
verbs layer does not appear to be affected.
Just to be clear, the only context in which this bug has been observed is GNI on Blue Waters?
MPI there is unaffected, verbs is unaffected. MPI on Bridges is not affected, I think?
We may not hold the 6.8.1 release for this, since it's not any sort of recent issue or regression. We'll obviously try to get it dealt with quickly.
- Subject changed from replicas slower than separate jobs to Replicas slower than separate jobs on GNI on Blue Waters
Correct, as far as I know this is a GNI issue.
I've only tested on Blue Waters. It may or may not affect Titan, Eos, Edison, Cori, Theta, Piz Daint, etc.
- Assignee set to Karthik Senthil
Also available in: Atom