Live Webcast 15th Annual Charm++ Workshop

Runtime Systems and Tools:
BigSim - Simulating PetaFLOPS Supercomputers


PetaFLOPS-class computers were deployed in 2008, and even larger computers are being planned (such as Blue Waters and Blue Gene/Q). The BigSim project is aimed at developing tools that allow programmers and scientists to develop, debug and tune/scale/predict the performance of applications before such machines are available, so that the applications can be ready when the machine first becomes operational. It also allows easier "offline" experimentation of parallel performance tuning strategies --- without using the full parallel computer. To the machine architects, BigSim provides a method for modeling the impact of architectural choices (including the communication network) on actual, full-scale applications. BigSim is currently being used to predict the performance of applications on the upcoming Blue Waters system. The BigSim simulation system consists of an emulator and a simulator.

BigSim Emulator

The BigSim Emulator can take any Charm++ or AMPI program and "run" it on a specified number of target processors (P) using the processors (Q) available to the emulator. For example, one can run an MPI program meant for P=100,000 processors using only Q=2,000 available processors. If the memory requirements of the application are larger than available memory on the Q processors, the emulator employs an integrated out-of-core execution scheme that uses the file system to store the target processor's data when not being executed. The emulator can be used to test and debug an application, especially for scaling bugs (such as a data structure of size P*P, where P is the number of processors). One can monitor memory usage, data values and output, debug for correctness, address algorithmic scaling issues such as convergence of numerical schemes, and operation counts for operations at full scale. The emulator can also be used to generate traces that are used for coarse timing predictions and for identification of performance bottlenecks, with a parallel discrete event simulator called BigSim Simulator.

BigSim Simulator

The BigSim Simulator is a trace-driven parallel discrete event simulator that models architectural parameters of the target machine, including (optionally) a detailed model of the communication network. It can be used to identify potential performance bottlenecks for the simulated application such as load imbalances, communication contention and long critical paths. It generates performance traces just as a real program running on the target machine would, allowing one to carry out normal performance visualization and analysis. For predicting performance of sequential code segments, the simulator allows a variable-resolution model, ranging from simple scale factors to interpolation based on performance counters (and possibly cycle-accurate simulators). For analyzing performance of communication networks, one can plug in either a very simple latency model, or a detailed model of the entire communication fabric. The fact that the simulator is parallel allows it to run very large networks.

This research component of BigSim has been supported by NSF awards NGS-0103645 and CSR-SMA-0720827, whereas the BigSim deployment for Blue Waters is being funded by NSF via the Blue Waters project, under grant OCI-0725070.
Evaluating HPC Networks via Simulation of Parallel Workloads [SC 2016]
[PhD Thesis]
Optimization of Communication Intensive Applications on HPC Networks [Thesis 2016]
Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations [PADABS, EURO-PAR 2015]
| Bilge Acun | Nikhil Jain | Abhinav Bhatele | Misbah Mubarak | Christopher Carothers | Laxmikant Kale
Avoiding Hot-Spots on Two-Level Direct Networks [SC 2011]
Simulation-based Performance Analysis and Tuning for a Two-level Directly Connected System [ICPADS 2011]
Avoiding Hot-Spots on Two-Level Direct Networks [SC 2011]
Simulating Large Scale Parallel Applications using Statistical Models for Sequential Execution Blocks [ICPADS 2010]
[MS Thesis]
A Preliminary Investigation of Emulating Applications That Use Petabytes of Memory on Petascale Machines [Thesis 2007]
Parallel VHDL Simulation [PPL Poster 2005]
Performance Modeling and Programming Environments for Petaflops Computers and the Blue Gene Machine [NSFNGS 2004]
BlueGene Emulator [IPDPS 2001]
Emulating Petaflops Machines and Blue Gene [IPDPS 2001]