High Performance Computing with Charm++

18th Annual Workshop on
Charm++ and Its Applications

October 20-21, 2020

Virtual Event



Thank you for your participation! Slides and videos will be made available soon.



About

The Charm++ Ecosystem

Charm++ is a C++ based parallel programming system based on an introspective adaptive runtime system, with many features suitable for addressing upcoming extreme scale as well as mid-scale challenges, and with multiple highly scalable parallel applications such as NAMD, ChaNGa, and OpenAtom.

Our group's goal is to develop technology that improves performance of parallel applications while also improving programmer productivity. We aim to reach a point where, with our freely distributed software base, complex irregular and dynamic applications can (a) be developed quickly and (b) perform scalably on machines with thousands of processors.

About

The Charm++ Workshop

The workshop is broadly focused on adaptivity in highly scalable parallel computing. It also takes stock of recent results in adaptive runtime techniques in Charm++ and the collaborative interdisciplinary research projects developed using it.

Important Dates


Category Dates
Abstracts due

September 15, 2020

September 30, 2020
Author notification

October 1, 2020

October 7, 2020 (rolling basis for early submissions)
Workshop October 20-21, 2020

Keynote Speakers


Jeffrey S. Vetter

Oak Ridge National Laboratory

Group Leader, Future Technologies Group, Computer Science and Mathematics Division

Jeffrey Vetter, Ph.D., is a Corporate Fellow at Oak Ridge National Laboratory (ORNL). At ORNL, he is currently the Section Head for Advanced Computer Systems Research and the founding director of the Experimental Computing Laboratory (ExCL). Previously, Vetter was the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division from 2003 until 2020. Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Vetter is a Fellow of the IEEE, and a Distinguished Scientist Member of the ACM. In 2010, Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was awarded the ACM Gordon Bell Prize. In 2015, Vetter served as the SC15 Technical Program Chair. His recent books, entitled "Contemporary High Performance Computing: From Petascale toward Exascale (Vols. 1 and 2)," survey the international landscape of HPC. Learn more information at https://ft.ornl.gov/~vetter/.



Preparing for Extreme Heterogeneity in High Performance Computing


While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as heterogeneous cores, deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions to address the concerns of energy-efficiency and cost. However, we expect this 'golden age' of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. Software will need to be redesigned to exploit these new capabilities and provide some level of performance portability across these diverse architectures. In this talk, I will survey these emerging technologies, discuss their architectural and software implications, and describe several new approaches (e.g., domain specific languages, intelligent runtime systems) to address these challenges.

Hartmut Kaiser

Louisiana State University

Adjunct Assistant Professor, Department of Computer Science
Senior Research Scientist, Center for Computation and Technology


Hartmut is a member of the faculty at the CS department at Louisiana State University (LSU) and a senior research scientist at LSU's Center for Computation and Technology (CCT). He received his doctorate from the Technical University of Chemnitz (Germany). He is probably best known through his involvement in open source software projects, mainly as the author of several C++ libraries he has contributed to Boost, which are in use by C++ thousands of developers worldwide. He is a voting member of the ISO C++ Standardization Committee. His current research is focused on leading the STE||AR group at CCT working on the practical design and implementation of future execution models and programming abstractions. His research interests are focused on the complex interaction of compiler technologies, runtime systems, active libraries, and modern system's architectures. His goal is to enable the creation of a new generation of scientific applications in powerful, though complex environments, such as high performance computing, distributed computing, many task runtime systems, and compiler technologies.



Asynchronous Programming in Modern C++


With the advent of modern computer architectures characterized by many-core nodes, deep and complex memory hierarchies, heterogeneous subsystems, and power-aware components, it is becoming increasingly difficult to achieve best possible application scalability and satisfactory parallel efficiency. The community is experimenting with new programming models that rely on finer-grain parallelism, flexible and lightweight synchronization, combined with work-queue-based, message-driven computation. The recently growing interest in the C++ programming language increases the demand for libraries implementing those programming models for the language. We present a new asynchronous C++ parallel programming model that is built around lightweight tasks and mechanisms to orchestrate massively parallel and distributed execution. This model uses the concept of Futures to make data dependencies explicit, employs explicit and implicit asynchrony to hide latencies and to improve utilization, and manages finer-grain parallelism with a work-stealing scheduling system enabling automatic load balancing of tasks. We have developed and implemented such a model as a C++ library exposing a higher-level parallelism API that is fully conforming to the existing C++11/14/17 standards and is aligned with the ongoing standardization work. This API and programming model has shown to enable writing highly efficient parallel applications for heterogeneous resources with excellent performance and scaling characteristics.

John Mellor-Crummey

Rice University

Professor of Computer Science and of Electrical and Computer Engineering

John Mellor-Crummey is a Professor of Computer Science at Rice University in Houston, TX. His research focuses on software technology for high performance parallel computing. His current research includes tools for measurement and analysis of application performance, tools for dynamic data race detection, and techniques for network performance analysis and optimization. He leads the research and development of the HPCToolkit Performance Tools, principally supported by the DOE Exascale Computing Project. His past work has included development of compilers and runtime systems for parallel computing, scalable software synchronization algorithms for shared-memory multiprocessors, and techniques for execution replay of parallel programs. Mellor-Crummey has co-led development of the OMPT tools interface for OpenMP 5. He is a co-recipient of the 2006 Dijkstra Prize in Distributed Computing and a Fellow of the ACM.



Towards Performance Tools for Emerging GPU-Accelerated Exascale Supercomputers


To tune applications for emerging exascale supercomputers, application developers need tools that measure the performance of applications on GPU-accelerated platforms and attribute application performance back to program source code. This talk will describe work in progress developing extensions to Rice University's HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications on current supercomputers based on NVIDIA GPUs and forthcoming exascale systems based on AMD and Intel GPUs. At present, HPCToolkit's support for NVIDIA's GPUs is the most mature. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. HPCToolkit measures both profiles and traces of GPU execution. To measure GPU-accelerated applications efficiently, HPCToolkit employs novel wait-free data structures to coordinate monitoring and attribution of GPU performance metrics. To help developers understand the performance of complex GPU code generated from high-level template-based programming models, HPCToolkit's hpcprof constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grain analysis and tuning, HPCToolkit uses platform-dependent hardware and software measurement capabilities to attribute GPU performance metrics to source lines and loops. We illustrate HPCToolkit's emerging capabilities for analyzing GPU-accelerated applications with several case studies.

Follow Us