Parallel Programming Laboratory

Runtime Systems and Tools:

Power and Energy Efficiency

Power and energy efficiency are important challenges for the High Performance Computing (HPC) community. Excessive power consumption is a main limitation for further scaling of HPC systems, and researchers believe that current technology trends will not provide Exascale performance within a reasonable power budget in near future. Hardware innovations such as the proposed Exascale architectures and Near Threshold Computing are expected to improve power efficiency significantly, but more innovations are required in this domain to make Exascale possible.

To help shrink the power efficiency gap, we argue that adaptive runtime systems can be exploited. The runtime system (RTS) can save significant power, since it is aware of both the hardware properties and the application behavior.

Adaptive Runtime Systems. We use application-centric analysis of different architectures to design automatic adaptive RTS techniques that save significant power in different system components, only with minor hardware support. In a nutshell, we analyze different modern architectures and common applications and illustrate that some system components such as caches and network links consume extensive power disproportionately for common HPC applications. We demonstrate how a large fraction of power consumed in caches and networks can be saved using our approach automatically. In these cases, the hardware support the RTS needs is the ability to turn off ways of set-associative caches and network links.

Optimizing Energy Consumption. Furthermore, cooling energy needs to be considered for large-scale systems. As of today, most of the research has focused on saving machine energy consumption leaving behind energy spent on cooling which takes about 40% of the total energy consumption for a datacenter. Our focus is to extend energy optimization work beyond machine energy saving so that we reduce cooling energy. Most datacenters do excessive cooling in order to avoid hotspots (areas in the machine room which are at a much higher temperature than other parts of the room). We are working on a runtime system which uses Dynamic Voltage and Frequency Scaling (DVFS) in order to minimize the occurrence of hotspots by keeping core temperatures in check. While doing so, one of our schemes reduces the timing penalty associated with using just DVFS by doing chare migration in order to load balance the application. Our results show that we can save considerable cooling energy using this temperature aware load balancing. Part of our recent research is exploring the possibility of load balancing chares in a way that we place 'less-frequency-sensitive' chares on hotter cores so that we can further reduce DVFS induced slowdown.

Performance Optimization Under Power Budget. Recent advances in processor and memory hardware designs have made it possible for the user to control the power consumption of the CPU and memory through software, e.g., the power consumption of Intel’s Sandy Bridge family of processors can be user-controlled through the Running Average Power Limit (RAPL) library. It has been shown that increase in the power allowed to the processor (and/or memory) does not yield a proportional increase in the application’s performance. As a result, for a given power budget, it can be better to run an application on larger number of nodes with each node capped at lower power than fewer nodes each running at its TDP. This is also called as overprovisioning. The optimal resource configuration for an application can be determined by profiling an application’s performance for varying number of nodes, CPU power and memory power and then selecting the best performing configuration for the given power budget. In our recent work, we propose a performance modeling scheme that estimates the essential power characteristics of a job at any scale. Our online resource manager uses these performance characteristics for making scheduling and resource allocation decisions that maximize the job throughput of the supercomputer under a given power budget. With a power budget of 4.75 MW, we can obtain up to 5.2X improvement in job throughput when compared with the SLURM scheduling policy that is power-unaware. With real experiments on a relatively small scale cluster, we obtained 1.7X improvement. An adaptive runtime system allows further improvement by allowing already running jobs to shrink and expand for optimal resource allocation.

Several of our new online softwares and methods such as Power Aware Resource Manager [14-15] and Variation Aware Scheduler [15-01] use linear/integer programming to come up with superior solutions as compared to solutions obtained from suboptimal heuristics.

People

Papers/Talks

19-05

2019
[Paper]

Fine-Grained Energy Efficiency Using Per-CoreDVFS with an Adaptive Runtime System [IGSC 2019]

| Bilge Acun | Kavitha Chandrasekar | Laxmikant Kale

[PDF]

More...

16-13

2016
[Paper]

Neural Network-Based Task Scheduling with Preemptive Fan Control [E2SC 2016]

| Bilge Acun | Eun Kyung Lee | Yoonho Park | Laxmikant Kale

[BIB]

More...

16-12

2016
[Paper]

Power, Reliability, Performance: One System to Rule Them All [Computer 2016]

[BIB] [PDF]

More...

16-10

2016
[Paper]

Energy-optimal Configuration Selection for Manycore Chips with Variation [IJHPCA 2016]

| Akhil Langer | Ehsan Totoni | Udatta Palekar | Laxmikant Kale

[BIB] [PDF]

More...

16-08

2016
[Paper]

Variation Among Processors Under Turbo Boost in HPC Systems [ICS 2016]

| Bilge Acun | Phil Miller | Laxmikant Kale

[BIB] [PDF]

More...

16-03

2016
[Paper]

Mitigating Processor Variation through Dynamic Load Balancing [VarSys, IPDPS 2016]

| Bilge Acun | Laxmikant Kale

[BIB] [PDF]

More...

15-11

2015
[Paper]

Analyzing Energy-Time Tradeoff in Power Overprovisioned HPC Data Centers [HPPAC 2015]

| Akhil Langer | Harshit Dokania | Laxmikant Kale | Udatta Palekar

[BIB] [PDF]

More...

15-05

2015
[PhD Thesis]

Power and Energy Management of Modern Architectures in Adaptive HPC Runtime Systems [Thesis 2015]

| Ehsan Totoni

[BIB] [PDF]

More...

15-01

2015
[Paper]

Energy-efficient Computing for HPC Workloads on Heterogeneous Manycore Chips [PMAM 2015]

| Akhil Langer | Ehsan Totoni | Udatta Palekar | Laxmikant Kale

[BIB] [PDF]

More...

14-38

2014
[Talk]

Power Aware Resource Manager [SC 2014]

| Akhil Langer

[PDF]

More...

14-35

2014
[Paper]

Scheduling for HPC Systems with Process Variation Heterogeneity [PPL Technical Report 2014]

| Ehsan Totoni | Akhil Langer | Josep Torrellas | Laxmikant Kale

[PDF]

More...

14-27

2014
[Paper]

Power Management of Extreme-scale Networks with On/Off Links in Runtime Systems [TOPC 2014]

| Ehsan Totoni | Nikhil Jain | Laxmikant Kale

[BIB] [PDF]

More...

14-23

2014
[Paper]

Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy [SC 2014]

| Ehsan Totoni | Josep Torrellas | Laxmikant Kale

[BIB] [PDF]

More...

14-19

2014
[Paper]

Position Paper: Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability [MODSIM 2014]

| Laxmikant Kale | Akhil Langer | Osman Sarood

[BIB] [PDF]

More...

14-15

2014
[Paper]

Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget [SC 2014]

| Osman Sarood | Akhil Langer | Abhishek Gupta | Laxmikant Kale

[BIB] [PDF]

More...

14-02

2014
[Paper]

Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

13-56

2013
[Paper]

Easy, Fast and Energy Efficient Object Detection on Heterogeneous On-Chip Architectures [ACM TACO 2013]

| Ehsan Totoni | Mert Dikmen | Maria Garzaran

[BIB] [PDF]

More...

13-50

2013
[Talk]

A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]

| Osman Sarood | Esteban Meneses | Laxmikant Kale

[PDF] [KEY] [PPT]

More...

13-33

2013
[Paper]

Thermal Aware Automated Load Balancing for HPC Applications [Cluster 2013]

[BIB] [PDF]

More...

13-25

2013
[Paper]

A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]

| Osman Sarood | Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

13-20

2013
[Paper]

Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems [Cluster 2013]

[BIB] [PDF]

More...

13-10

2013
[Talk]

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]

| Ehsan Totoni

[PDF]

More...

13-09

2013
[Paper]

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]

| Ehsan Totoni | Nikhil Jain | Laxmikant Kale

[BIB] [PDF]

More...

12-43

2012
[Talk]

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

12-37

2012
[Paper]

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

12-28

2012
[Paper]

Efﬁcient ‘Cool Down’ of Parallel Applications [PASA 2012]

| Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

12-27

2012
[Paper]

Cloud Friendly Load Balancing for HPC Applications: Preliminary Work [CloudTech-HPC 2012]

| Osman Sarood | Abhishek Gupta | Laxmikant Kale

[BIB] [PDF]

More...

12-20

2012
[Paper]

‘Cool’ Load Balancing for High Performance Computing Data Centers [IEEE TC 2012]

| Osman Sarood | Phil Miller | Ehsan Totoni | Laxmikant Kale

[BIB] [PDF]

More...

12-10

2012
[Talk]

Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs [ISPASS 2012]

| Ehsan Totoni

[PDF]

More...

11-18

2011
[Paper]

A ‘Cool’ Load Balancer for Parallel Applications [SC 2011]

| Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

11-10

2011
[Paper]

Temperature Aware Load Balancing for Parallel Applications: Preliminary Work [HPPAC 2011]

| Osman Sarood | Abhishek Gupta | Laxmikant Kale

[BIB] [PDF]

More...

Live Webcast 15th Annual Charm++ Workshop