ACM SRC: Fast Profiling-based Performance Modeling of Distributed GPU Applications
    
    International Conference for High Performance Computing, Networking, Storage and Analysis  (SC) 2019
    Publication Type: Poster
    Repository URL: https://sc19.supercomputing.org/presentation/?id=spostg126&sess=sess240
    
        Download: 
        
      
    Thumbnail
    TextRef
      
        An increasing number of applications utilize GPUs to accelerate computation, with MPI responsible for communication in distributed environments. Existing performance models only focus on either modeling GPU kernels or MPI communication; few that do model the entire application are often too specialized for a single application and require extensive input from the programmer.
To be able to quickly model different types of distributed GPU applications, we propose a profiling-based methodology for creating performance models. We build upon the roofline performance model for GPU kernels and analytical models for MPI communication, with a significant reduction in profiling time. We also develop a benchmark to model 3D halo exchange that occurs in many scientific applications. Our proposed model for the main iteration loops of MiniFE achieves 6-7% prediction error on LLNL Lassen and 1-2% error on PSC Bridges, with minimal code inspection required to model MPI communication.
      
    People
      
    Research Areas
      
  








