Charm++ power, resilience work featured in IEEE computer Oct ’16 issue!
Over the years PPL has carried out research on multiple aspects of optimizing power, energy, temperature without sacrificing performance. Automatic runtime adaptation through the Charm++ runtime system has been a key foundation to all the approaches explored. Resilience related solutions are also enabled by the same runtime approach. A broad summary of our research and its connection with adaptive runtime was published recently in IEEE Computer.

Link to the web article: Power, Reliability, Performance: One System to Rule Them All [IEEE Computer October 2016]
Charm++ Release 6.7.1

Changes in this release are primarily bug fixes for 6.7.0. The major exception is AMPI, which has seen changes to its extension APIs and now complies with more of the MPI standard. A brief list of changes follows:

Charm++ Bug Fixes

  • Startup and exit sequences are more robust
  • Error and warning messages are generally more informative
  • CkMulticast’s set and concat reducers work correctly

AMPI Features

  • AMPI’s extensions have been renamed to use the prefix AMPI_ instead of MPI_ and to generally follow MPI’s naming conventions
  • AMPI_Migrate(MPI_Info) is now used for dynamic load balancing and all fault tolerance schemes (see the AMPI manual)
  • AMPI officially supports MPI-2.2, and also implements the non-blocking collectives and neighborhood collectives from MPI-3.1

Platforms and Portability

  • Cray regularpages build target has been fixed
  • Clang compiler target for BlueGene/Q systems added
  • Comm. thread tracing for SMP mode added
  • AMPI’s compiler wrappers are easier to use with autoconf and cmake
Jonathan Lifflander defends his dissertation
Jonathan Lifflander successfully defended his dissertation entitled "Optimizing Work Stealing Algorithms with Scheduling Constraints". His thesis examines methodologies to improve the efficiency of fork--join programming models in conjunction with work stealing schedulers by exploiting persistency in iterative scientific benchmarks. His thesis demonstrates a highly scalable implementation of distributed-memory work stealing using a novel tracing framework to record task execution locations in the presence of random steals, while incurring very low overheads. This same tracing framework is used to optimize work stealing on NUMA architectures. Finally, by introducing data effect annotations to fork--join models in conjunction with runtime tracing, his work enables fork--join schedulers to execute ahead of syncs to accrue cache locality benefits.
Nikhil defends his dissertation
PPLer Nikhil Jain has successfully defended his dissertation titled "Optimization of Communication Intensive Applications on HPC Networks". In a hour long public presentation given to his thesis committee, which consists of Illinois Professors Kale, Gropp, Torrellas, and OSU Prof. Panda, Nikhil described the importance of communication in HPC applications and presented his two step approach for optimizing it on HPC networks. Use of machine learning to perform diagnostic studies that can help identify important metrics forms the first step. The second step is to use parallel discrete event simulation tools developed based on learning from the first step for mimicking communication flow on HPC networks. The thesis presents a few example use cases of these tools by comparing HPC networks with different topologies and by predicting the impact of changes in network parameters. In addition to this methodology, the thesis also contains work on topology aware mapping, job placement, and communication algorithms. More details on Nikhil’s research and his thesis can be found at his personal home page.
Job Opportunity: Vst. Research Programmer Position - closed Feb 22 (extended)
Charm++ and AMPI BoF at SC15
PPL at SC15
Charm++ Tutorial at SBAC-PAD 2015
Celso, Laércio and Esteban will present the Charm++ tutorial at the 27th annual SBAC-PAD on October 21st in Santa Catarina, Brazil. Link to info
Akhil Langer receives Kenichi Miura Award 2015
PPLer Akhil Langer has received the 2015 Kenichi Miura Award. This award honors a graduate student for outstanding accomplishments in High Performance Computing. Akhil works with Prof Laxmikant Kale and Prof Udatta Palekar on several aspects of high performance computing including power and energy optimizations, stochastic optimization, load balancing, adaptive mesh refinement. Akhil's thesis work provides a computational-engine for many real-time and dynamic problems faced by US Air Mobility Command. It is expected that this work will provide the springboard for more robust problem solving with HPC in many logistics and planning problems.
Charm++ tutorial in Brazil
Laércio Lima Pilla is leading a Charm++ tutorial as part of a regional gathering on HPC:, April 22-24, 2015. Laércio is a former student of Prof. Navaux (Federal University of Rio Grande do Sul) and Prof. Mehaut (University of Grenoble). He is now an associate professor at the University of Santa Catarina in Brazil.
The Coding Illini team with PPLer Phil Miller win 2014 PUCC
PPL @ SC'14
Lifflander et al. Win Best Student Paper at CLUSTER'14

Jonathan Lifflander, Esteban Meneses, Harshitha Menon, Phil Miller, Sriram Krishnamoorthy, and Laxmikant V. Kale have won the best student paper award at CLUSTER'14 in Madrid, Spain!

This was awarded for their fault-tolerance paper that describes a new theoretical model for dependencies that reduces the amount of data required to perform deterministic replay. Using the algorithm presented, we demonstrate 2x better performance and scalability up to 128k cores of BG/P `Intrepid'. The paper is entitled: Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance.

Harshitha to receive George Michael Memorial HPC Fellowship at SC'14
Harshitha Menon, PhD candidate advised by Prof. Laxmikant Kale, is a recipient of the 2014 ACM/IEEE-CS George Michael Memorial High Performance Computing Fellowship. This prestigious fellowship honors exceptional PhD students around the world whose research focus is on high performance computing, networking, storage, and large-scale data analysis. Fellowship winners are selected based on overall potential for research excellence and academic progress. This fellowship provides a $5000 honorarium and the award will be presented at the SC’14 Awards Ceremony.

“I am honored to receive this award.” said Harshitha. “It is a great opportunity to publicize my research work within the HPC community.”

Harshitha's research focuses on developing scalable load balancing algorithms and adaptive run time techniques to improve the performance of large scale dynamic applications. Her research covers performance optimizations of cosmology simulation application called ChaNGa, which is a collaborative research project between PPL and astrophysicists at University of Washington.

Also this year, Harshitha received the 2014 Google Anita Borg Memorial Scholarship and in 2012 she was selected as a Siebel Scholar.

“This award will be another prestigious feather in Harshitha’s cap!” said Prof. Laxmikant Kalé, PPL director. “Just a few months ago she won the Google Anita Borg scholarship. She has been doing excellent work in parallel computing and I’m especially proud of her efforts in scaling ChaNGa, our computational cosmology application, up to 512K cores.”

This is the third year in a row that a PPL student is acknowledged for the George Michael Memorial HPC Fellowship award.

See announcement reprint at
Ehsan receives Andrew and Shana Laursen Fellowship at Illinois
Ehsan has been selected for the Andrew and Shana Laursen Fellowship for fall 2014. About the fellowship: the Andrew and Shana Laursen Fellowship was established in 2001 to provide meaningful assistance in the recruitment and support of top graduate students to the Department of Computer Science, and to improve the quality of education and research at the University of Illinois.
Xiang wins Best Poster Award at LLNL Student Poster Symposium
Title: Lossy Compression for Checkpointing: Fallible or Feasible?

Large checkpoints pose a challenge as HPC applications scale to hundreds of thousands of processors because of the space they consume and the time required to transfer them to stable storage. To address this problem, this poster proposes use of lossy compression to reduce checkpoint size and studies the trade-off between the loss of precision and the compression ratio. As a proof of concept, for ChaNGa (a cosmology code developed over Charm++), we show that use of moderate lossy compression reduces checkpoint size by 3-5x while maintaining correctness.

This poster by Xiang Ni, a PPLer interning at LLNL, was judged as one of the best posters at Lawrence Livermore National Laboratory's annual Student's Poster Symposium that hosted approximately 100 posters.
6 Papers Accepted at Supercomputing'2014
PPL has six papers accepted in the technical program of the prestigious supercomputing 2014 conference! This is a record for us, although PPL had 4 papers in some previous years (2013, 2011). The 6 papers are:

We are looking forward to a strong presence at SC14 in New Orleans.
Abhishek Gupta defends his PhD thesis on HPC in cloud
PPLer Abhishek Gupta has successfuly defended his PhD thesis on effective High Performance Computing (HPC) in the Cloud. Here is his thesis abstract: The advantages of pay-as-you-go model, elasticity, and the flexibility and customization offered by virtualization make cloud computing an attractive option for meeting the needs of some HPC users. However, there is a mismatch between cloud environments and HPC requirements. The poor interconnect and I/O performance in cloud, HPC-agnostic cloud schedulers, and the inherent heterogeneity and multi-tenancy in cloud are some bottlenecks for HPC in cloud. This thesis goes beyond the research question: "what is the performance of HPC in cloud?" and explores "how can we perform effective and efficient HPC in cloud?" To this end, we adopt the complementary approach of making clouds HPC-aware, and HPC runtime system cloud-aware. Through intelligent application-to-platform mapping, HPC-aware VM placement, interference-aware VM consolidation, cloud-aware HPC load balancing, and malleable jobs, we demonstrate significant benefits for both: users and cloud providers in terms of cost (up to 60%), performance (up to 45%), and throughput (up to 32%).
Ehsan receives 3rd place award for ACM Student Research Competition Grand Finals 2014
Ehsan is placed 3rd for ACM Student Research Competition (SRC) Grand Finals 2014 in graduate category. ACM SRC is held every year at several major conferences in different computer science areas, and the winners compete in the Grand Finals round (more info here). Ehsan won ACM SRC at SC'13 to be able to compete in the ACM SRC Grand Finals round for 2014. His research is entitled "Structure-Adaptive Parallel Solution of Sparse Triangular Linear Systems." He receives his award at the ACM Awards Banquet.
Harshitha Menon receives the Anita Borg Scholarship for 2014
Harshitha is one of the recipients of the Google Anita Borg Memorial Scholarship for 2014. More information about the scholarship can be found here
Nikhil receives IBM PhD Fellowship Award
IBM announced PPLer Nikhil Jain as one of the recepients for the IBM PhD Fellowship Award for the academic year 2014-2015. More information on the fellowship can be obtained here. Nikhil's research page is here.
Lukasz receives ORNL distinguished software award
Developers of the Scalable Heterogeneous Computing (SHOC) Benchmark Suite, including Lukasz Wesolowski of PPL, have received an award from Oak Ridge National Laboratory for the most distinguished software released in the last five fiscal years in the Computer Science and Mathematics Division. SHOC , developed by a team led by Jeff Vetter of ORNL, is a collection of CUDA/OpenCL/MPI benchmarks to test performance and stability of modern heterogeneous computing systems and clusters comprising GPUs and Intel Xeon Phi accelerators.
Akhil et al win Best Paper Award at HiPC 2013
Akhil et al's work on Parallelization of Stochastic Integer Optimization has won the best paper award at the 20th IEEE International Conference on High Performance Computing, HiPC 2013 (details here). This work has been done in collaboration with Prof Udatta Palekar from Department of Business at UIUC and is supported by MITRE Corp and AMC. The talk on the paper can be found here.
Akhil et al win Two Best Poster Awards at HiPC 2013
Akhil et al has won two SRS best poster awards at HiPC 2013 for their work on Scalable and Asynchronous Algorithms for Structured Adaptive Mesh Refinement (corresponding earlier version of the paper can be found here). The talk on the poster can be found here.
<font color="red"><b>Seeking a Visiting Research Programmer</b></font>
Gupta et al. win Best Paper Award at IEEE CloudCom '13
The team of researchers led by PPLer Abhishek Gupta has received the Best Paper Award for their work "The Who, What, Why, and How of High Performance Computing in the Cloud" presented at the 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013) held in Bristol, UK. The paper was selected among the sixty papers accepted to the conference, which had a 18% acceptance rate.

This work was the outcome of a successful collaboration between University of Illinois at Urbana Champaign and HP Labs. This research is motivated by the recent emergence of cloud computing as an alternative to supercomputers for some of the high-performance computing (HPC) applications that do not require a fully dedicated machine. With cloud as an additional deployment option, HPC users are faced with the challenges of dealing with highly heterogeneous resources, where the variability spans across a wide range of processor configurations, interconnections, virtualization environments, and pricing rates and models.

This work takes a holistic viewpoint to answer the question – why and who should choose cloud for HPC, for what applications, and how should cloud be used for HPC? To this end, the paper presents a comprehensive performance evaluation and analysis of a set of benchmarks and complex HPC applications on a range of platforms, varying from supercomputers to clouds. Further, the paper demonstrates HPC performance improvements in cloud using alternative lightweight virtualization mechanisms – thin VMs and OS-level containers, and hypervisor- and application-level CPU affinity. Next, it analyzes the economic aspects and business models for HPC in clouds. The team believes that is an important area that has not been sufficiently addressed by past research. Overall results indicate that current public clouds are cost-effective only at small scale for the chosen HPC applications, when considered in isolation, but can complement supercomputers using business models such as cloud burst and application-aware mapping.

Here is a link to the paper . This work has also received some good media recognition
Ehsan and Nikhil win ACM SRC at SC'13
PPLers Ehsan Totoni and Nikhil Jain won Gold and Silver awards respectively in ACM Student Research Competition at Supercomputing'2013. Ehsan presented his work on Structure-Aware Parallel Algorithm for Solution of Sparse Triangular Linear Systems (details here). He will also get a chance to compete in the ACM SRC Grand Finale. Nikhil presented his work on Fast Prediction of Network Performance: k-packet Simulation (details here). More information on ACM SRC is available here.
PPL Events @ SC'13
Charm++ BoF at Supercomputing 13
Jonathan Lifflander wins ACM/IEEE George Michael HPC Fellowship