Reduction Starting messages - Performance optimal solution
Due to migratability, the current Chare Array reduction mechanic triggers partition wide broadcasts. This is problematic for several reasons.
1. For arrays which do not migrate, this is pure overhead.
2. For arrays which do not span the entire partition and will not migrate off their subset of cores, the message traffic to unused cores is overhead
3. This happens even when arrays are constructed with flags that disable anytime migrate and anytime insertion.
The task is to implement a version of the chare array reduction which can perform better for the very common case of no anytime migration and no anytime insertion. Notionally, the empty core and how many elements per core tracking issues should be something which can be determined after doneInserting completes. Possible extensions would be to allow for reset of such quantities in dynamic situations by doneInserting, or some other mechanism.