megatest and megacon should work for large node counts
Some test runtimes appear to scale as O(P) or worse, which makes tests useless for testing large machines. Test complexity should only scale as O(P^2) for small P, and then limit to O(P) complexity so that runtime is constant as P increases. If this is not possible skip the test for large P.
#1 Updated by Jim Phillips almost 3 years ago
There are some extremely slow tests even for 64 nodes with +ppn 60:
/home/jphillip/charm/gni-crayxc-persistent-smp-knl-debug/tests/charm++/megatest/pgm Charm++> Running on Gemini (GNI) with 64 processes Charm++> static SMSG Charm++> SMSG memory: 316.0KB Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> only comm thread send/recv messages Charm++> Cray TLB page size: 8192K Charm++> Running in SMP mode: numNodes 64, 60 worker threads per process Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.7.0-296-g8ce70e0 Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'. CharmLB> Load balancer assumes all CPUs are same. Charm++> cpu affinity enabled. Charm++> cpuaffinity PE-core map : 1-63:16.15+64+128+192 Charm++> set comm 0 on node 0 to core #0 Charm++> Running on 16 unique compute nodes (256-way SMP). Megatest is running on 64 nodes 3840 processors. test 0: initiated [groupring (milind)] test 0: completed (5.53 sec) ... test 7: initiated [groupsectiontest (ebohm)] test 7: completed (52.06 sec) test 8: initiated [multisectiontest (ebohm)] test 8: completed (19.97 sec) ... test 16: initiated [migration (jackie)] test 16: completed (481.20 sec) ... test 26: initiated [immediatering (gengbin)] test 26: completed (2.79 sec) ... test 30: completed (5.71 sec) test 31: initiated [multi nodering (milind)] ... test 37: initiated [multi groupsectiontest (ebohm)] test 37: completed (513.88 sec) test 38: initiated [multi multisectiontest (ebohm)] test 38: completed (118.11 sec) ... test 45: initiated [multi migration (jackie)] ...job times out...