We're pleased to announce a beta release of Charm++ in advance of the upcoming version 6.8.0. We ask that users take this opportunity to test the latest code with their applications and report any issues encountered.
The code for this release can be obtained by
git clone https://charm.cs.illinois.edu/gerrit/charm.git
git checkout v6.8.0-beta2
(Beta 1 was not announced due to bugs found in internal testing)
Among over 700 commits made since the release of version 6.7.1, some of the larger and more exciting improvements in the system include:
- Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute.
- Calls to parameter-marshalled entry methods with large array arguments can ask for asynchronous zero-copy send behavior with an 'rdma' tag in the parameter's declaration.
- The runtime system now integrates an OpenMP runtime library so that code using OpenMP parallelism will dispatch work to idle worker threads within the Charm++ process.
- Applications can ask the runtime system to perform automatic high-level end-of-run performance analysis by linking with the ' -tracemode perfReport' option.
- Added a new dynamic remapping/load-balancing strategy, GreedyRefineLB, that offers high result quality and well bounded execution time.
- Charm++ programs can now define their own main() function, rather than using a generated implementation from a mainmodule/mainchare combination. This extends the existing Charm++/MPI interoperation feature.
- GPU manager now creates one instance per OS process and scales the pre-allocated memory pool size according to the GPU memory size and number of GPU manager instances on a physical node.
- Several GPU Manager API changes including:
- Replaced references to global variables in the GPU manager API with calls to functions.
- The user is no longer required to specify a bufferID in dataInfo struct.
- Replaced calls to kernelSelect with direct invocation of functions passed via the work request object (allows CUDA to be built with all programs).
- Added support for malleable jobs that can dynamically shrink and expand the set of compute nodes hosting Charm++ processes.
- Greatly expanded and improved reduction operations:
- Added built-in reductions for all logical and bitwise operations on integer and boolean input.
- Reductions over groups and chare arrays that apply commutative, associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now processed in a streaming fashion. This reduces the memory footprint of reductions. User-defined reductions can opt into this mode as well.
- Added a new 'Tuple' reducer that allows combining multiple reductions of different input data and operations from a common set of source objects to a single target callback.
- Added a new 'Summary Statistics' reducer that provides count, mean, and standard deviation using a numerically-stable streaming algorithm.
- Added a '++quiet' option to suppress charmrun and charm++ non-error messages at startup.
- Calls to chare array element entry methods with the [inline] tag now avoid copying their arguments when the called method takes its parameters by const&, offering a substantial reduction in overhead in those cases.
- Synchronous entry methods that block until completion (marked with the [sync] attribute) can now return any type that defines a PUP method, rather than only message types.
- Improved and expanded topology-aware spanning tree generation strategies, including support for runs on a torus with holes, such as Blue Waters and other Cray XE/XK systems.
Future portability/compatibility note:
Please be aware that all feature releases of the Charm++ system following the final 6.8 will require full C++11 support from the compiler and standard library in use.
Aurora Early Science Program
NAMD is 1 of 10 computational science and engineering research projects that were selected for the ALCF Aurora Early Science Program. Aurora is expected to arrive in 2018 and will be a massively parallel, manycore Intel-Cray supercomputer. For more information about this program, click here.
The project "Free energy landscapes of membrane transport proteins" will be using NAMD and is lead by Benoit Roux, The University of Chicago, in collaboration with NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute, The University of Illinois.
High Performance Computing Camp - Escuela de Computación de Alto RendimientoTechniques and methodology for parallel programming - Module 4: Programming with parallel objects
Rescheduled and is now Sep 18-29, 2017, Buenos Aires, Argentina
Day 1: Parallel Objects Programming Fundamentals Introduction to basic concepts: overdecomposition, asynchrony, migratability and adaptivity. The parallel objects model and its advantages over traditional methods. Introduction to Charm++ programming language. Charm++ programming and execution model. Installation of Charm++ and associated libraries. Basic Charm++ code samples. Use and properties of chare arrays.
Day 2: Performance Analysis and Load Balancing Introduction to Projections, a performance analysis tool. Visualizing executions and analysis of experimental results. Performance bottleneck detection. Introduction to load balancing. Object migration and PUP methods. Load balancing strategies in Charm++. Use of different load balancing strategies for particular problems.
Day 3: Advanced Programming with Charm++ Advanced programming mechanisms in Charm++. Multidimensional array usage and chare groups. Introduction to checkpointing and its applications.
Day 4: High Level Programming with Charm++ Introduction to Structured Dagger (SDAG), a tool for high-level programming in Charm++. Survey of other high-level languages in the Charm++ ecosystem. Presentation of real applications using Charm++.
Link to the web article: Power, Reliability, Performance: One System to Rule Them All [IEEE Computer October 2016]
Changes in this release are primarily bug fixes for 6.7.0. The major exception is AMPI, which has seen changes to its extension APIs and now complies with more of the MPI standard. A brief list of changes follows:
- Startup and exit sequences are more robust
- Error and warning messages are generally more informative
- CkMulticast’s set and concat reducers work correctly
- AMPI’s extensions have been renamed to use the prefix AMPI_ instead of MPI_ and to generally follow MPI’s naming conventions
- AMPI_Migrate(MPI_Info) is now used for dynamic load balancing and all fault tolerance schemes (see the AMPI manual)
- AMPI officially supports MPI-2.2, and also implements the non-blocking collectives and neighborhood collectives from MPI-3.1
- Cray regularpages build target has been fixed
- Clang compiler target for BlueGene/Q systems added
- Comm. thread tracing for SMP mode added
- AMPI’s compiler wrappers are easier to use with autoconf and cmake
Jonathan Lifflander, Esteban Meneses, Harshitha Menon, Phil Miller, Sriram Krishnamoorthy, and Laxmikant V. Kale have won the best student paper award at CLUSTER'14 in Madrid, Spain!
This was awarded for their fault-tolerance paper that describes a new theoretical model for dependencies that reduces the amount of data required to perform deterministic replay. Using the algorithm presented, we demonstrate 2x better performance and scalability up to 128k cores of BG/P `Intrepid'. The paper is entitled: Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance.
“I am honored to receive this award.” said Harshitha. “It is a great opportunity to publicize my research work within the HPC community.”
Harshitha's research focuses on developing scalable load balancing algorithms and adaptive run time techniques to improve the performance of large scale dynamic applications. Her research covers performance optimizations of cosmology simulation application called ChaNGa, which is a collaborative research project between PPL and astrophysicists at University of Washington.
Also this year, Harshitha received the 2014 Google Anita Borg Memorial Scholarship and in 2012 she was selected as a Siebel Scholar.
“This award will be another prestigious feather in Harshitha’s cap!” said Prof. Laxmikant Kalé, PPL director. “Just a few months ago she won the Google Anita Borg scholarship. She has been doing excellent work in parallel computing and I’m especially proud of her efforts in scaling ChaNGa, our computational cosmology application, up to 512K cores.”
This is the third year in a row that a PPL student is acknowledged for the George Michael Memorial HPC Fellowship award.
- 2014: Harshitha Menon (PhD candidate)
- 2013: Jonathan Lifflander (PhD candidate) and Edgar Solomonik (PPL alum)
- 2012: Honorable Mention: Yanhua Sun (PhD candidate)
- 2009: Abhinav Bhatele (PhD Graduate)
See announcement reprint at cs.illinois.edu.
Large checkpoints pose a challenge as HPC applications scale to hundreds of thousands of processors because of the space they consume and the time required to transfer them to stable storage. To address this problem, this poster proposes use of lossy compression to reduce checkpoint size and studies the trade-off between the loss of precision and the compression ratio. As a proof of concept, for ChaNGa (a cosmology code developed over Charm++), we show that use of moderate lossy compression reduces checkpoint size by 3-5x while maintaining correctness.
This poster by Xiang Ni, a PPLer interning at LLNL, was judged as one of the best posters at Lawrence Livermore National Laboratory's annual Student's Poster Symposium that hosted approximately 100 posters.
- Parallel Programming with Migratable Objects: Charm++ in Practice Bilge Acun | Abhishek Gupta | Nikhil Jain | Akhil Langer | Harshitha Menon | Eric Mikida | Xiang Ni | Michael Robson | Yanhua Sun | Ehsan Totoni | Lukasz Wesolowski | Laxmikant Kale
- Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget Osman Sarood | Akhil Langer | Abhishek Gupta | Laxmikant Kale
- Mapping to Irregular Torus Topologies and Other Techniques for Petascale Biomolecular Simulation James Phillips | Yanhua Sun | Nikhil Jain | Eric Bohm | Laxmikant Kale
- Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy Ehsan Totoni | Josep Torrellas | Laxmikant Kale
- Maximizing Network Throughput on the Dragonfly Interconnect Nikhil Jain | Abhinav Bhatele | Xiang Ni | Nicholas Wright | Laxmikant Kale
- Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing Jonathan Lifflander | Sriram Krishnamoorthy | Laxmikant Kale
Wednesday, November 19th | 4:30PM - 5:00PM | 391-92
Thursday, November 20th | 10:30AM - 11:00AM | 393-94-95
Tuesday, November 18th | 11:00AM - 11:30AM | 393-94-95
Thursday, November 20th | 4:00PM - 4:30PM | 393-94-95
Tuesday, November 18th | 4:00PM - 4:30PM | 393-94-95
Thursday, November 20th | 2:00PM - 2:30PM | 388-89-90
We are looking forward to a strong presence at SC14 in New Orleans.