Bug #1442

CkLoop fixed tree limits helper recruitment

Added by Jim Phillips about 2 months ago. Updated 3 days ago.

Target version:
Start date:
Due date:
% Done:


Spent time:


CkLoop uses a tree of branching 4 when CkMyNodeSize() >= 8. This means that if you have ppn=9 and rank 0 calls CkLoop, then ranks 5-8 are only recruited if rank 1 is available. If ranks 1-4 are busy then no helpers are activated. This is unexpected and performance-limiting for the use case where CkLoop work is urgent but PEs are generally busy with lower-priority tasks of small grainsize.

A more dynamic approach would be to use messages in the node queue to push notify messages for the PEs, one message per X PEs on the node (probably large X to amortize overhead, but some limit to allow scaling to large thead counts).

ckloop_node.patch View (2.36 KB) Jim Phillips, 02/22/2017 01:59 PM


#1 Updated by Jim Phillips about 2 months ago

Proof-of-concept patch attached. With this change I see the expected behavior, with all PEs able to steal work.
Issues with this patch:
1) Had to declare CmiPushNode() in CkLoop.C, should be added to converse.h and made non-static in all machine layers.
3) Modifies srcRank in notifyMsg, which I think might be enqueued from previous entries on other PEs. This would be unsafe if int write was non-atomic, or if CkLoop is sensitive to extra notify messages being received. It is done because there is no way to tell if message came from node queue or pe queue.
4) Only modified USE_CONVERSE_NOTIFICATION branch. Do we need the other side?

Other possible improvements:
1) For numChunks << nodesize it would be better to put numChunks/numChunks-1 messages in the node queue rather than messaging every PE.
2) If no work left don't send the notifies.
3) The implementation of tree broadcast notification broke tracing by over-writing srcRank to -1.

#2 Updated by Eric Bohm about 2 months ago

  • Assignee set to Seonmyeong Bak

#3 Updated by Phil Miller 3 days ago

  • Target version changed from 6.8.0 to 6.8.1

I'm inclined to defer this as not release-critical, since CkLoop could do better, but it's not something that's gotten worse, and the issue doesn't prevent its usage or break other things in the system.

Also available in: Atom PDF