Project

General

Profile

Bug #1470

Investigate broken load balancers in mini-apps

Added by Michael Robson 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Category:
Load Balancing
Target version:
Start date:
03/16/2017
Due date:
% Done:

0%


Description

Excerpt from an external email sent by Debashis Ganguly:

I am able to run leanmd mini-app with 5 different load balancers in SMP mode. However, I am unable to run any other with any load balancer.

In the Charm++ website, it is mentioned that AMR has an automatic feature to support load balancers. However, when I try to run AMR it aborts with an error "Cannot insert array element twice!". This is the same with jacobi2d under the AMR within examples. This led me to believe it has something to do with AMR library under ck-libs. Whereas, jacobi2D under examples folder upon running throws memory corruption error.

I also had downloaded lulesh from LLNL website. Unfortunately, it is compatible with earlier version of Charm++. It doesn't print any debug message when run with the +LBDebug option. There is no way to trace whether load balancer is working or not. Moreover, with and without any load balancer, the performance is same.

I have also tried running wave2d this gives segmentation fault after running for awhile.

History

#1 Updated by Michael Robson 4 months ago

From Kavitha:

For the amr/jacobi2d example, I get the same error as Debashis, for current charm branch. It seems like it could be an old error https://lists.cs.illinois.edu/lists/arc/charm/2010-05/msg00034.html .

#2 Updated by Sam White 4 months ago

I think we agreed the AMR library could be removed from mainline charm entirely. If someone wants to do LB with AMR they should check out the AMR mini-app, not the library.

LULESH and wave2d should both be investigated and fixed.

#3 Updated by Sam White 4 months ago

  • Assignee set to Kavitha Chandrasekar
  • Category set to Load Balancing

#4 Updated by Kavitha Chandrasekar 4 months ago

Lulesh can be run with load balancing with a few minor changes like updating uses of CmiTrue and atomic. AtSync() calls are commented out by default, hence to invoke the load balancer we need to add AtSync() calls.

For wave2d, it might be good to invoke load balancing in At Sync mode instead of using Periodical LB. It would also be good to investigate the load imbalance in the example.

#5 Updated by Phil Miller 4 months ago

  • Target version set to 6.8.0

#6 Updated by Kavitha Chandrasekar 4 months ago

Since the mini-apps would work with minimal changes, should we follow up on the email with suggestions?

#7 Updated by Sam White 4 months ago

  • Status changed from New to In Progress

I don't think this needs to be targeted to 6.8.0 if the fixes are to mini-apps that are hosted in external repos. Are there fixes to examples/tests in the charm repo that need to be done for this?

#8 Updated by Kavitha Chandrasekar 4 months ago

No there no fixes for examples/test, only to external repos.

#9 Updated by Kavitha Chandrasekar 4 months ago

  • Status changed from In Progress to Closed

Closing this issue since the mini-apps work okay. Following is the status:

1. amr mini-app in smp mode is fixed
2. Lulesh (the charm++ version on lulesh website) works with LB when AtSync calls are added at iteration boundaries.
3. To run wave2d with load balancing, adding a pup method and calling PeriodLB or calling LB AtSync calls works okay. The example might not have much load imbalance though.

Also available in: Atom PDF