Reduction clause doesn't work on the OpenMP integration with icc and clang
Reduction clause doesn't work on the OpenMP integration with icc and clang but works with GCC.
The reason why it only works with GCC is GCC doesn't use internal APIs on the library while ICC and Clang do.
GCC uses very simple algorithm for reduction.
1. Each thread stores its local results into a corresponding element in a shared array.
2. The master thread iterates over this array to do reduction on the local results.
ICC and Clang uses more complex algorithm.
1. If the size of the team where reduction happens is less than the certain size predetermined heuristically, then it uses atomic or locks to do reduction on a single shared variable.
2. If the size of them is bigger than the heuristic value, then it uses trees.
Now, I modified the integration library to use atomic or locks even when it is considered to use trees.
It's because using trees make some of PEs on Converse pending until other PEs having the same parent reach the reduction point.
So, now, ICC and Clang uses atomic or critical method to do reduction. If it is considered to incur significant overhead, then I'll reconsider the other ways for reduction.