Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study
TeraGrid 2010
Publication Type: Paper
Repository URL: 201003_CharmSMPOptimization
Abstract
Clusters of multicore nodes have become the most popular option for
new HPC systems due to their scalability and performance/cost
ratio. The complexity of programming multicore systems underscores
the need for powerful and efficient runtime systems that manage
resources such as threads and communication sub-systems on behalf
of the applications. In this paper, we study several multicore
performance issues on clusters using Intel, AMD and IBM processors
in the context of the Charm++ runtime system. We then present the
optimization techniques that overcome these performance issues. The
techniques presented are general enough to apply to other runtime
systems as well. We demonstrate the benefits of these optimizations
through both synthetic benchmarks and production quality
applications including NAMD and ChaNGa on several popular multicore
platforms. We demonstrate performance improvement of NAMD and
ChaNGa by about 20% and 10%, respectively.
TextRef
Chao Mei and Gengbin Zheng and Filippo Gioachin and Laxmikant V. Kale, "Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study", to appear in Proceedings of Teragrid'10
People
Research Areas