Runtime Systems and Tools: Power and Energy Efficiency

Power and energy efficiency are important challenges for the High Performance Computing (HPC) community. Excessive power consumption is a main limitation for further scaling of HPC systems, and researchers believe that current technology trends will not provide Exascale performance within a reasonable power budget in near future. Hardware innovations such as the proposed Exascale architectures and Near Threshold Computing are expected to improve power efficiency significantly, but more innovations are required in this domain to make Exascale possible.

To help shrink the power efficiency gap, we argue that adaptive runtime systems can be exploited. The runtime system (RTS) can save significant power, since it is aware of both the hardware properties and the application behavior.

Adaptive Runtime Systems. We use application-centric analysis of different architectures to design automatic adaptive RTS techniques that save significant power in different system components, only with minor hardware support. In a nutshell, we analyze different modern architectures and common applications and illustrate that some system components such as caches and network links consume extensive power disproportionately for common HPC applications. We demonstrate how a large fraction of power consumed in caches and networks can be saved using our approach automatically. In these cases, the hardware support the RTS needs is the ability to turn off ways of set-associative caches and network links.

Optimizing Energy Consumption. Furthermore, cooling energy needs to be considered for large-scale systems. As of today, most of the research has focused on saving machine energy consumption leaving behind energy spent on cooling which takes about 40% of the total energy consumption for a datacenter. Our focus is to extend energy optimization work beyond machine energy saving so that we reduce cooling energy. Most datacenters do excessive cooling in order to avoid hotspots (areas in the machine room which are at a much higher temperature than other parts of the room). We are working on a runtime system which uses Dynamic Voltage and Frequency Scaling (DVFS) in order to minimize the occurrence of hotspots by keeping core temperatures in check. While doing so, one of our schemes reduces the timing penalty associated with using just DVFS by doing chare migration in order to load balance the application. Our results show that we can save considerable cooling energy using this temperature aware load balancing. Part of our recent research is exploring the possibility of load balancing chares in a way that we place 'less-frequency-sensitive' chares on hotter cores so that we can further reduce DVFS induced slowdown.

Performance Optimization Under Power Budget. Recent advances in processor and memory hardware designs have made it possible for the user to control the power consumption of the CPU and memory through software, e.g., the power consumption of Intel’s Sandy Bridge family of processors can be user-controlled through the Running Average Power Limit (RAPL) library. It has been shown that increase in the power allowed to the processor (and/or memory) does not yield a proportional increase in the application’s performance. As a result, for a given power budget, it can be better to run an application on larger number of nodes with each node capped at lower power than fewer nodes each running at its TDP. This is also called as overprovisioning. The optimal resource configuration for an application can be determined by profiling an application’s performance for varying number of nodes, CPU power and memory power and then selecting the best performing configuration for the given power budget. In our recent work, we propose a performance modeling scheme that estimates the essential power characteristics of a job at any scale. Our online resource manager uses these performance characteristics for making scheduling and resource allocation decisions that maximize the job throughput of the supercomputer under a given power budget. With a power budget of 4.75 MW, we can obtain up to 5.2X improvement in job throughput when compared with the SLURM scheduling policy that is power-unaware. With real experiments on a relatively small scale cluster, we obtained 1.7X improvement. An adaptive runtime system allows further improvement by allowing already running jobs to shrink and expand for optimal resource allocation.

People
Papers/Talks
14-27
2014
[Paper]
Power Management of Extreme-scale Networks with On/Off Links in Runtime Systems [TOPC 2014]
14-23
2014
[Paper]
Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy [SC 2014]
14-19
2014
[Paper]
Position Paper: Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability [MODSIM 2014]
14-15
2014
[Paper]
Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget [SC 2014]
14-02
2014
[Paper]
Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]
13-56
2013
[Paper]
Easy, Fast and Energy Efficient Object Detection on Heterogeneous On-Chip Architectures [ACM TACO 2013]
| Ehsan Totoni | Mert Dikmen | Maria Garzaran
13-50
2013
[Talk]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-33
2013
[Paper]
Thermal Aware Automated Load Balancing for HPC Applications [Cluster 2013]
| Harshitha Menon | Bilge Acun | Simon Garcia De Gonzalo | Osman Sarood | Laxmikant Kale
13-25
2013
[Paper]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-20
2013
[Paper]
Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems [Cluster 2013]
| Osman Sarood | Akhil Langer | Laxmikant Kale | Barry Rountree | Bronis Supinski
13-10
2013
[Talk]
Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]
13-09
2013
[Paper]
Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]
12-43
2012
[Talk]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-37
2012
[Paper]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-28
2012
[Paper]
Efficient ‘Cool Down’ of Parallel Applications [PASA 2012]
12-27
2012
[Paper]
Cloud Friendly Load Balancing for HPC Applications: Preliminary Work [CloudTech-HPC 2012]
12-20
2012
[Paper]
‘Cool’ Load Balancing for High Performance Computing Data Centers [IEEE TC 2012]
12-10
2012
[Talk]
Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs [ISPASS 2012]
11-18
2011
[Paper]
A ‘Cool’ Load Balancer for Parallel Applications [SC 2011]
11-10
2011
[Paper]
Temperature Aware Load Balancing for Parallel Applications: Preliminary Work [HPPAC 2011]