Scaling Hierarchical N-Body Simulations on GPU Clusters
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2010
Publication Type: Paper
Repository URL: 2009ChaNGaGPU
Abstract
This paper focuses on the use of clusters of general purpose graphics processors as offload devices for tree-based N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments are carried out to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present scaling performance results from experiments on the NCSA's Lincoln GPU cluster.
TextRef
Pritish Jetley, Lukasz Wesolowski, Filippo Gioachin, Laxmikant V. Kalé and Thomas R. Quinn, "Scaling Hierarchical N-Body Simulations on GPU Clusters", Proceedings of the ACM/IEEE Supercomputing Conference 2010.
People
Research Areas