++oneWthPerSocket doesn't work on Darwin
$ ./charmrun ./jacobi 2 2 2 40 +vp8 +balancer RotateLB +LBDebug 1 ++local ++np 1 ++oneWthPerSocket Charmrun> scalable start enabled. Charmrun> Error: Invalid request for 0 PEs among 1 processes per host.
This should do the same thing as ++oneWthPerHost and use 1 PE
#7 Updated by Sam White over 1 year ago
But if my MacBook has a single socket with 8 PUs, why shouldn't "++processPerSocket 1 ++oneWthPerSocket" launch 1 process with one worker thread and one comm thread in it, for a netlrts-darwin-x86_64-smp build?
If I did "++processPerPU 1 ++oneWthPerPU" then we should warn the user (or abort? Need to check what the behavior was pre-hwloc launch commands...) saying that they have oversubscribed threads on the hardware.
#8 Updated by Evan Ramos over 1 year ago
./charmrun ++local ++processPerPU 1 ++oneWthPerPU ./hello Charmrun> scalable start enabled. Charmrun> Error: Invalid request for 0 PEs among 8 processes per host.
I agree that the error message should be clearer. Beyond that, I'm not sure what the best course of action is here. I currently special-case ++oneWthPerHost to allocate one worker thread and one comm thread, otherwise the option would be entirely useless. Maybe the same thing needs to be done for the other options.
#9 Updated by Sam White over 1 year ago
Here's my understanding of the current hwloc-based command line arguments. For a concrete example, let's say we have 1 host with 2 sockets, each socket has 8 cores and each core has 2 PUs. Then the host has in total 16 cores or 32 PUs.
Non-SMP mode is mostly straightforward, but there are a couple questions as to the placement of threads:
Non-SMP Mode: ++processPerHost 1 = 1 process ++processPerHost 2 = 2 processes (first 2 PUs or first 2 cores or one per socket?) ++processPerHost 16 = 16 processes (first 16 PUs or one per core?) ++processPerHost 32 = 32 processes, one per PU ++processPerSocket 1 = 2 processes, one on each of the sockets ++processPerSocket 8 = 16 processes (one per core or first 8 PUs of each socket?) ++processPerSocket 16 = 32 processes, one on each of the PUs ++processPerCore 1 = 16 processes, one on each of the cores ++processPerCore 2 = 32 processes, one on each of the PUs ++processPerPU 1 = 32 processes, one on each of the PUs ++processPerHost >32 = error ++processPerSocket >16 = error ++processPerCore >2 = error ++processPerPU >1 = error
SMP Mode: ++processPerHost 1 ++oneWthPerHost = error (currently special-cased to work with 1 wth + 1 commth) ++processPerHost 1 ++oneWthPerSocket = 1 process with 1 wth + 1 commth, each on a different socket ++processPerHost 1 ++oneWthPerCore = 1 process with 15 wth, each on a different core, + 1 commth ++processPerHost 1 ++oneWthPerPU = 1 process with 31 wth + 1 commth ++processPerSocket 1 ++oneWthPerHost = error ++processPerSocket 1 ++oneWthPerSocket = error (could be special-cased?) ++processPerSocket 1 ++oneWthPerCore = 2 processes, 1 per socket, with 7 wth + 1 commth each ++processPerSocket 1 ++oneWthPerPU = 2 processes, 1 per socket, with 15 wth + 1 commth each ++processPerCore 1 ++oneWthPerHost = error ++processPerCore 1 ++oneWthPerSocket = error ++processPerCore 1 ++oneWthPerCore = error (could be special cased?) ++processPerCore 1 ++oneWthPerPU = 16 processes, 1 per core, each with 1 wth + 1 commth each ++processPerPU 1 ++oneWthPer* = error ++processPerSocket 2 ++oneWthPerCore = 4 processes, 2 per socket, with 3 wth + 1 commth each ++processPerSocket 2 ++oneWthPerPU = 4 processes, 2 per socket, with 7 wth + 1 commth each
Is this right? Obviously there are more combinations that can be used