Optimizing the performance of parallel applications on a 5D torus via task mapping
| Abhinav Bhatele | Nikhil Jain | Katherine Isaacs | Ronak Buch | Todd Gamblin | Steven Langer | Laxmikant Kale
IEEE International Conference on High Performance Computing (HiPC) 2014
Publication Type: Paper
Repository URL:
Six of the ten fastest supercomputers in the world in 2014 use a torus interconnection network for message passing between compute nodes. Torus networks provide high bandwidth links to near-neighbors and low latencies over multiple hops on the network. However, large diameters of such networks necessitate a careful placement of parallel tasks on the compute nodes to minimize network congestion. This paper presents a methodological study of optimizing application performance on a five-dimensional torus network via the technique of topology-aware task mapping. Task mapping refers to the placement of processes on compute nodes while carefully considering the network topology between the nodes and the communication behavior of the application. We focus on the IBM Blue Gene/Q machine and two production applications - a laser-plasma interaction code called pF3D and a lattice QCD application called MILC. Optimizations presented in the paper improve the communication performance of pF3D by 90% and that of MILC by up to 47%.
Research Areas