Project

General

Profile

Support #1842

Make Jenkins SMP builds run faster

Added by Sam White over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Build & Test Automation
Target version:
Start date:
04/03/2018
Due date:
% Done:

0%


Description

We are seeing a lot of timeouts in SMP builds for Jenkins commit-triggered builds, because tests/charm++/pingpong/ takes an extremely long time to run with +p2.
I think this is due to not using +pemap or +processPer* +oneWthPer* arguments, so there is contention among threads.

Would adding +CmiSleepOnIdle to the TESTOPTS take care of the problem?

../../../bin/testrun  ./pgm +p1  
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 1.438 seconds.
Charm++> Running in SMP mode: numNodes 1,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: d56bd536
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Pingpong with payload: 100 iterations: 1000
Roundtrip time for 1D Arrays is 0.452995 us
Roundtrip time for 1D threaded Arrays is 1.855850 us
Roundtrip time for 1D Arrays (zero copy message send api) is 1.228094 us
Roundtrip time for 1D Arrays Marshalled is 0.860929 us
Roundtrip time for 2D Arrays is 0.474930 us
Roundtrip time for 3D Arrays is 0.482082 us
Roundtrip time for Fancy Arrays is 0.482082 us
Roundtrip time for Chares (reuse msgs) is 0.253201 us
Roundtrip time for Chares (new/del msgs) is 0.528097 us
Roundtrip time for threaded Chares (reuse) is 1.569986 us
Roundtrip time for Chares (zero copy message send api) is 1.096010 us
Roundtrip time for Groups is 0.306129 us
Roundtrip time for Groups (zero copy message send api) is 1.097679 us
Roundtrip time for Groups (1 KB pipe, no memcpy, no allocs) is 0.333786 us
Roundtrip time for Groups (1 KB pipe, no memcpy, w/ allocs) is 0.627995 us
Roundtrip time for Groups (1 KB pipe, w/ memcpy, w/ allocs) is 0.877857 us
Roundtrip time for NodeGroups is 0.458241 us
Roundtrip time for NodeGroups (zero copy message send api) is 1.268387 us
[Partition 0][Node 0] End of program

../../../bin/testrun  ./pgm +p2  
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 1.428 seconds.
Charm++> Running in SMP mode: numNodes 2,  1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: d56bd536
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.080 seconds.
Pingpong with payload: 100 iterations: 1000
Roundtrip time for 1D Arrays is 67191.994905 us
Roundtrip time for 1D threaded Arrays is 66923.834801 us
Roundtrip time for 1D Arrays (zero copy message send api) is 67187.839985 us
Roundtrip time for 1D Arrays Marshalled is 66603.994131 us
Roundtrip time for 2D Arrays is 67123.995066 us
Roundtrip time for 3D Arrays is 66975.993872 us
Roundtrip time for Fancy Arrays is 66655.992031 us
Roundtrip time for Chares (reuse msgs) is 66475.993872 us
Roundtrip time for Chares (new/del msgs) is 66047.847033 us
Roundtrip time for threaded Chares (reuse) is 66979.846001 us
Roundtrip time for Chares (zero copy message send api) is 66395.854950 us
Roundtrip time for Groups is 65707.989931 us
Roundtrip time for Groups (zero copy message send api) is 66723.841906 us
Roundtrip time for Groups (1 KB pipe, no memcpy, no allocs) is 70596.000195 us
Roundtrip time for Groups (1 KB pipe, no memcpy, w/ allocs) is 68871.999979 us
Roundtrip time for Groups (1 KB pipe, w/ memcpy, w/ allocs) is 67676.001072 us

History

#1 Updated by Sam White over 1 year ago

The netlrts-linux-x86_64-smp autobuild target should I think have +CmiSleepOnIdle added to its TESTOPTS as well. netlrts-linux-smp already has that flag and passes tests, but the 64-bit one takes forever in pingpong with +p2.

#2 Updated by Sam White over 1 year ago

  • Tracker changed from Bug to Support

#3 Updated by Sam White over 1 year ago

On my lab machine (beauty), adding +CmiSleepOnIdle to TESTOPTS cuts down the time taken for SMP pingpong with +p2 by about 55% when threads are colocated on the same PE

#4 Updated by Sam White over 1 year ago

Actually it is a much more drastic improvement in some cases: from 64511.965036 us to 36.540985 us for 1D array pingpong.

#5 Updated by Sam White over 1 year ago

  • Status changed from New to Closed

Adding +CmiSleepOnIdle seems to have fixed the issue in Jenkins SMP builds. I also increased the parallel builds from -j4 to -j8.

Also available in: Atom PDF