Project

General

Profile

Bug #1789

++oneWthPerSocket doesn't work on Darwin

Added by Sam White 24 days ago. Updated 16 days ago.

Status:
Implemented
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
02/01/2018
Due date:
% Done:

0%


Description

$ ./charmrun ./jacobi 2 2 2 40 +vp8 +balancer RotateLB +LBDebug 1 ++local ++np 1 ++oneWthPerSocket
Charmrun> scalable start enabled. 
Charmrun> Error: Invalid request for 0 PEs among 1 processes per host.

This should do the same thing as ++oneWthPerHost and use 1 PE

History

#1 Updated by Sam White 24 days ago

Same thing in standalone mode. This is all on a netlrts-darwin-x86_64-smp build.

#2 Updated by Sam White 24 days ago

I'm seeing the same thing on Linux... netlrts-linux-x86_64-smp

#3 Updated by Evan Ramos 24 days ago

What happens if you change `++np 1` to `++processPerSocket 1`?

#4 Updated by Evan Ramos 23 days ago

I see the same problem on Darwin. I can't reproduce it on Linux.

I thought this patch would have prevented this error, but apparently not. https://charm.cs.illinois.edu/gerrit/3564

#5 Updated by Sam White 23 days ago

$ ./charmrun ./jacobi 2 2 2 40 +vp8 +balancer RotateLB +LBDebug 1 ++local ++processPerSocket 1 ++oneWthPerSocket
Charmrun> scalable start enabled. 
Charmrun> Error: Invalid request for 0 PEs among 1 processes per host.

#6 Updated by Evan Ramos 23 days ago

I can see it on Linux with the correct arguments.

What is happening is that one thread per socket is not enough for both a worker thread and a comm thread per process.

#7 Updated by Sam White 23 days ago

But if my MacBook has a single socket with 8 PUs, why shouldn't "++processPerSocket 1 ++oneWthPerSocket" launch 1 process with one worker thread and one comm thread in it, for a netlrts-darwin-x86_64-smp build?

If I did "++processPerPU 1 ++oneWthPerPU" then we should warn the user (or abort? Need to check what the behavior was pre-hwloc launch commands...) saying that they have oversubscribed threads on the hardware.

#8 Updated by Evan Ramos 23 days ago

./charmrun ++local ++processPerPU 1 ++oneWthPerPU ./hello
Charmrun> scalable start enabled. 
Charmrun> Error: Invalid request for 0 PEs among 8 processes per host.

I agree that the error message should be clearer. Beyond that, I'm not sure what the best course of action is here. I currently special-case ++oneWthPerHost to allocate one worker thread and one comm thread, otherwise the option would be entirely useless. Maybe the same thing needs to be done for the other options.

#9 Updated by Sam White 23 days ago

Here's my understanding of the current hwloc-based command line arguments. For a concrete example, let's say we have 1 host with 2 sockets, each socket has 8 cores and each core has 2 PUs. Then the host has in total 16 cores or 32 PUs.

Non-SMP mode is mostly straightforward, but there are a couple questions as to the placement of threads:

Non-SMP Mode:

++processPerHost 1    = 1  process
++processPerHost 2    = 2  processes (first 2 PUs or first 2 cores or one per socket?)
++processPerHost 16   = 16 processes (first 16 PUs or one per core?)
++processPerHost 32   = 32 processes, one per PU
++processPerSocket 1  = 2  processes, one on each of the sockets
++processPerSocket 8  = 16 processes (one per core or first 8 PUs of each socket?)
++processPerSocket 16 = 32 processes, one on each of the PUs
++processPerCore 1    = 16 processes, one on each of the cores
++processPerCore 2    = 32 processes, one on each of the PUs
++processPerPU 1      = 32 processes, one on each of the PUs

++processPerHost >32   = error
++processPerSocket >16 = error
++processPerCore >2    = error
++processPerPU >1      = error

SMP Mode:

++processPerHost 1  ++oneWthPerHost          = error (currently special-cased to work with 1 wth + 1 commth)
++processPerHost 1  ++oneWthPerSocket        = 1 process with 1 wth + 1 commth, each on a different socket
++processPerHost 1  ++oneWthPerCore          = 1 process with 15 wth, each on a different core, + 1 commth
++processPerHost 1  ++oneWthPerPU            = 1 process with 31 wth + 1 commth

++processPerSocket 1  ++oneWthPerHost        = error
++processPerSocket 1  ++oneWthPerSocket      = error (could be special-cased?)
++processPerSocket 1  ++oneWthPerCore        = 2 processes, 1 per socket, with 7 wth + 1 commth each
++processPerSocket 1  ++oneWthPerPU          = 2 processes, 1 per socket, with 15 wth + 1 commth each

++processPerCore 1  ++oneWthPerHost          = error
++processPerCore 1  ++oneWthPerSocket        = error
++processPerCore 1  ++oneWthPerCore          = error (could be special cased?)
++processPerCore 1  ++oneWthPerPU            = 16 processes, 1 per core, each with 1 wth + 1 commth each

++processPerPU 1  ++oneWthPer*               = error

++processPerSocket 2  ++oneWthPerCore        = 4 processes, 2 per socket, with 3 wth + 1 commth each
++processPerSocket 2  ++oneWthPerPU          = 4 processes, 2 per socket, with 7 wth + 1 commth each

Is this right? Obviously there are more combinations that can be used

#10 Updated by Evan Ramos 16 days ago

  • Status changed from New to Implemented

Also available in: Atom PDF