Parallel Programming Laboratory

Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study

| Chao Mei | Gengbin Zheng | Filippo Gioachin | Laxmikant Kale

TeraGrid 2010

Publication Type: Paper

Repository URL: 201003_CharmSMPOptimization

Download: [BIB] [PDF]

Abstract

Clusters of multicore nodes have become the most popular option for new HPC systems due to their scalability and performance/cost ratio. The complexity of programming multicore systems underscores the need for powerful and efficient runtime systems that manage resources such as threads and communication sub-systems on behalf of the applications. In this paper, we study several multicore performance issues on clusters using Intel, AMD and IBM processors in the context of the Charm++ runtime system. We then present the optimization techniques that overcome these performance issues. The techniques presented are general enough to apply to other runtime systems as well. We demonstrate the benefits of these optimizations through both synthetic benchmarks and production quality applications including NAMD and ChaNGa on several popular multicore platforms. We demonstrate performance improvement of NAMD and ChaNGa by about 20% and 10%, respectively.

TextRef

Chao Mei and Gengbin Zheng and Filippo Gioachin and Laxmikant V. Kale, "Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study", to appear in Proceedings of Teragrid'10

People

Research Areas

Charm++

Live Webcast 15th Annual Charm++ Workshop