Project

General

Profile

Bug #1183

megatest and megacon should work for large node counts

Added by Jim Phillips over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Build & Test Automation
Target version:
-
Start date:
08/24/2016
Due date:
% Done:

0%


Description

Some test runtimes appear to scale as O(P) or worse, which makes tests useless for testing large machines. Test complexity should only scale as O(P^2) for small P, and then limit to O(P) complexity so that runtime is constant as P increases. If this is not possible skip the test for large P.

History

#1 Updated by Jim Phillips over 2 years ago

There are some extremely slow tests even for 64 nodes with +ppn 60:

/home/jphillip/charm/gni-crayxc-persistent-smp-knl-debug/tests/charm++/megatest/pgm
Charm++> Running on Gemini (GNI) with 64 processes
Charm++> static SMSG
Charm++> SMSG memory: 316.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: numNodes 64,  60 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-296-g8ce70e0
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> cpuaffinity PE-core map : 1-63:16.15+64+128+192
Charm++> set comm 0 on node 0 to core #0
Charm++> Running on 16 unique compute nodes (256-way SMP).
Megatest is running on 64 nodes 3840 processors.
test 0: initiated [groupring (milind)]
test 0: completed (5.53 sec)
...
test 7: initiated [groupsectiontest (ebohm)]
test 7: completed (52.06 sec)
test 8: initiated [multisectiontest (ebohm)]
test 8: completed (19.97 sec)
...
test 16: initiated [migration (jackie)]
test 16: completed (481.20 sec)
...
test 26: initiated [immediatering (gengbin)]
test 26: completed (2.79 sec)
...
test 30: completed (5.71 sec)
test 31: initiated [multi nodering (milind)]
...
test 37: initiated [multi groupsectiontest (ebohm)]
test 37: completed (513.88 sec)
test 38: initiated [multi multisectiontest (ebohm)]
test 38: completed (118.11 sec)
...
test 45: initiated [multi migration (jackie)]
...job times out...

#2 Updated by Eric Bohm over 2 years ago

  • Assignee set to Keshav Santhanam

#3 Updated by Sam White over 2 years ago

  • Assignee changed from Keshav Santhanam to Jaemin Choi

Also available in: Atom PDF