Project

General

Profile

Bug #1827

+showcpuaffinity doesn't show info about remote comm thread (comm thread not on node 0)

Added by Nitin Bhat over 1 year ago. Updated over 1 year ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/09/2018
Due date:
% Done:

0%


Description

[nbhat4@r277 leanmd]$ ../../../bin/testrun  +p4 ./leanmd 3 3 3 20 ++ppn 2 +pemap 0,1 +commap 2,3 +showcpuaffinity

Running on 2 processors:  ./leanmd 3 3 3 20 +ppn 2 +pemap 0,1 +commap 2,3 +showcpuaffinity
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 2  ./leanmd 3 3 3 20 +ppn 2 +pemap 0,1 +commap 2,3 +showcpuaffinity
Charm++>ofi> provider: psm2
Charm++>ofi> control progress: 2
Charm++>ofi> data progress: 2
Charm++>ofi> maximum inject message size: 64
Charm++>ofi> eager maximum message size: 65536 (maximum header size: 40)
Charm++>ofi> cq entries count: 8
Charm++>ofi> use inject: 1
Charm++>ofi> maximum rma size: 4294967295
Charm++>ofi> mr mode: 0x2
Charm++>ofi> use memory pool: 0
Charm++>ofi> use request cache: 0
Charm++>ofi> number of pre-allocated recvs: 8
Charm++>ofi> exchanging addresses over OFI
Charm++> Running in SMP mode: numNodes 2,  2 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-424-g091c6e5
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> cpuaffinity PE-core map : 0,1
Charm++> set PE 0 on node 0 to core #0
Charm++> set PE 2 on node 1 to core #0
Charm++> set PE 3 on node 1 to core #1
Charm++> set PE 1 on node 0 to core #1
Charm++> set comm 0 on node 0 to core #2
[4] thread CPU affinity mask is 0x00000004
[5] thread CPU affinity mask is 0x00000008
Charm++> Running on 2 unique compute nodes (28-way SMP).
Charm++> cpu topology info is gathered in 0.019 seconds.

LENNARD JONES MOLECULAR DYNAMICS START UP ...

Input Parameters...
Cell Array Dimension X:3 Y:3 Z:3 of size 16 16 16
Final Step Count:20

Cells: 3 X 3 X 3 .... created
Starting simulation ....

Simulation completed
[Partition 0][Node 0] End of program

In the example, the runtime should also display Charm++> set comm 0 on node 1 to core #2

History

#1 Updated by Evan Ramos over 1 year ago

  • Assignee set to Evan Ramos
  • Status changed from New to Implemented

Potential fix: https://charm.cs.illinois.edu/gerrit/3840

Test output:

$ ./charmrun ++nodelist ~/nodelist +p4 ++ppn 2 ./hello +showcpuaffinity 20 +pemap 0,1 +commap 2,2
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 1.329 seconds.
Charm++> Running in SMP mode: numNodes 2,  2 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.8.2-456-g3b46e82e3
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled. 
Charm++> cpuaffinity PE-core map : 0,1
Charm++> set PE 0 on node 0 to core #0
Charm++> set PE 2 on node 1 to core #0
Charm++> set PE 3 on node 1 to core #1
Charm++> set PE 1 on node 0 to core #1
HWLOC> [2] Thread 0x7ffff7fc8d00 bound to cpu: 0 cpuset: 0x00000001
HWLOC> [3] Thread 0x7ffff6f4e700 bound to cpu: 1 cpuset: 0x00000002
Charm++> set comm 1 on node 1 to core #2
HWLOC> [5] Thread 0x7ffff674d700 bound to cpu: 2 cpuset: 0x00000004
[5] thread CPU affinity mask is 0x00000004
HWLOC> [1] Thread 0x7ffff6f4e700 bound to cpu: 1 cpuset: 0x00000002
HWLOC> [0] Thread 0x7ffff7fc8d00 bound to cpu: 0 cpuset: 0x00000001
Charm++> set comm 0 on node 0 to core #2
HWLOC> [4] Thread 0x7ffff674d700 bound to cpu: 2 cpuset: 0x00000004
[4] thread CPU affinity mask is 0x00000004
Charm++> Running on 2 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.030 seconds.
Running Hello on 4 processors for 20 elements
[0] Hello 0 created
[0] Hello 1 created
[0] Hello 2 created
[0] Hello 3 created
[0] Hello 4 created
[1] Hello 5 created
[1] Hello 6 created
[1] Hello 7 created
[1] Hello 8 created
[1] Hello 9 created
[0] Hi[17] from element 0
[0] Hi[18] from element 1
[0] Hi[19] from element 2
[0] Hi[20] from element 3
[0] Hi[21] from element 4
[1] Hi[22] from element 5
[1] Hi[23] from element 6
[1] Hi[24] from element 7
[1] Hi[25] from element 8
[1] Hi[26] from element 9
[2] Hello 10 created
All done
[3] Hello 15 created
[2] Hello 11 created
[2] Hello 12 created
[3] Hello 16 created
[2] Hello 13 created
[3] Hello 17 created
[2] Hello 14 created
[3] Hello 18 created
[3] Hello 19 created
[2] Hi[27] from element 10
[2] Hi[28] from element 11
[2] Hi[29] from element 12
[2] Hi[30] from element 13
[2] Hi[31] from element 14
[3] Hi[32] from element 15
[3] Hi[33] from element 16
[3] Hi[34] from element 17
[3] Hi[35] from element 18
[3] Hi[36] from element 19
[Partition 0][Node 0] End of program

#2 Updated by Sam White over 1 year ago

  • Target version set to 6.9.0
  • Status changed from Implemented to Merged

Also available in: Atom PDF