Project

General

Profile

Feature #1040

support multiple InfiniBand cards per node

Added by Jim Phillips about 3 years ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Machine Layers
Target version:
Start date:
04/19/2016
Due date:
% Done:

0%


Description

The Crest Power8/GPU cluster at OLCF has two InfiniBand cards per node:
https://www.olcf.ornl.gov/kb_articles/crest-user-information/

Charm++ currently attaches to the first interface of the first card it finds. For better performance Charm++ processes (nodes) should distribute themselves across the active interfaces. Ideally each processes would pick the card closest to the cores to which its pes are bound, but in the meantime the user could specify which network interface to use with a runtime option like "+netmap 0,1". (Please don't use +devices as NAMD uses this for GPUs and Xeon Phi coprocessors.)

History

#1 Updated by Jim Phillips over 2 years ago

Target platform is now Summitdev:
https://www.olcf.ornl.gov/kb_articles/summitdev-quickstart/#Hardware

If we end up using PAMI on Summit this will not be necessary.

#2 Updated by Jim Phillips over 2 years ago

Confirming that both HCAs connect to a single network (rather than two parallel networks):

[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 122
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2" 
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1" 
[28] -> switch port {0xe41d2d030051e9a0}[6] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1" 
[19] -> switch port {0x7cfe9003009600b0}[31] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1" 
[1] -> ca port {0x248a07030047a586}[1] lid 122-122 "summitdev-r0c1n01 HCA-2" 
To ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2" 
[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 124
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2" 
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1" 
[22] -> switch port {0x7cfe900300a44b50}[9] lid 17-17 "MF0;summitdev-ibcore1:MSB7700/U1" 
[12] -> switch port {0x7cfe9003009774e0}[20] lid 38-38 "MF0;summitdev-ibleaf-r0c1a:MSB7700/U1" 
[1] -> ca port {0x248a07030047a5b2}[1] lid 124-124 "summitdev-r0c1n01 HCA-1" 
To ca {0x248a07030047a5b2} portnum 1 lid 124-124 "summitdev-r0c1n01 HCA-1" 
[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 42 122
From ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1" 
[1] -> switch port {0x7cfe900300a44b70}[13] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1" 
[28] -> switch port {0xe41d2d030051e9a0}[1] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1" 
[19] -> switch port {0x7cfe9003009600b0}[31] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1" 
[1] -> ca port {0x248a07030047a586}[1] lid 122-122 "summitdev-r0c1n01 HCA-2" 
To ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2" 
[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 42 120
From ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1" 
[1] -> switch port {0x7cfe900300a44b70}[13] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1" 
[29] -> switch port {0xe41d2d030051e9a0}[2] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1" 
[6] -> switch port {0x7cfe9003009600f0}[28] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1" 
[15] -> ca port {0x248a07030047a57e}[1] lid 120-120 "summitdev-r0c0n15 HCA-2" 
To ca {0x248a07030047a57e} portnum 1 lid 120-120 "summitdev-r0c0n15 HCA-2" 
[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 42
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2" 
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1" 
[33] -> switch port {0x7cfe900300bcee50}[6] lid 7-7 "MF0;summitdev-ibcore4:MSB7700/U1" 
[2] -> switch port {0x7cfe900300a44b70}[34] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1" 
[13] -> ca port {0x248a07030047a9ca}[1] lid 42-42 "summitdev-r0c0n13 HCA-1" 
To ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1" 
[jimp@summitdev-r0c0n13 ~/summitdev]$ ibtracert 122 124
From ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2" 
[1] -> switch port {0x7cfe9003009600b0}[1] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1" 
[23] -> switch port {0x7cfe9003009601d0}[16] lid 5-5 "MF0;summitdev-ibcore2:MSB7700/U1" 
[15] -> switch port {0x7cfe9003009774e0}[27] lid 38-38 "MF0;summitdev-ibleaf-r0c1a:MSB7700/U1" 
[1] -> ca port {0x248a07030047a5b2}[1] lid 124-124 "summitdev-r0c1n01 HCA-1" 
To ca {0x248a07030047a5b2} portnum 1 lid 124-124 "summitdev-r0c1n01 HCA-1" 

#3 Updated by Eric Bohm over 2 years ago

  • Target version changed from 6.8.0 to 6.8.1

#4 Updated by Eric Bohm almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#5 Updated by Phil Miller almost 2 years ago

Nitin, please work out how critical and feasible this is, and this whether it should be a target to complete for 6.9, with an intended preview release by SC.

#6 Updated by Jim Phillips over 1 year ago

If we're using pami then it's not so critical, except if we wanted to compare pami to verbs.

#7 Updated by Eric Bohm over 1 year ago

  • Assignee set to Nitin Bhat

#8 Updated by Eric Bohm over 1 year ago

  • Target version changed from 6.9.0 to 6.9.1

We expect to use pamilrts on summit dev so this is not urgent.

#9 Updated by Sam White 6 months ago

  • Target version changed from 6.9.1 to 6.10.0

#10 Updated by Evan Ramos 4 months ago

Jim Phillips wrote:

Charm++ currently attaches to the first interface of the first card it finds.

After this change was merged, it now queries all devices, eliminates inactive ones, and chooses the fastest active device to use. It still only uses one device. https://charm.cs.illinois.edu/gerrit/c/charm/+/4474

#11 Updated by Nitin Bhat 4 months ago

With a code browse of the PAMI{lrts} machine layers, I couldn't determine if PAMI uses multiple Infiniband cards internally. I'm checking with Bilge to see if she knows more about it.

#12 Updated by Nitin Bhat 3 months ago

  • Target version changed from 6.10.0 to 7 (Next Generation Charm++)

I contacted Hui-fang Wen from IBM and heard back that PAMI internally uses multiple Infiniband cards i.e. each process uses the card closest to its core. For that reason, I'm retargeting this issue. In the future, we can evaluate if it is relevant for Verbs/UCX.

#13 Updated by Eric Bohm 3 months ago

  • Target version changed from 7 (Next Generation Charm++) to 6.11

Also available in: Atom PDF