Runtime Systems and Tools: Energy Aware Computing
Keeping up with the energy requirements of current day supercomputers is fast becoming a major challenge. The major consumers of energy in an HPC datacenter are machine, interconnect and cooling. As of today, most of the research has focused on saving machine energy consumption leaving behind energy spent on cooling which takes about 40% of the total energy consumption for a datacenter. Our focus is to extend energy optimization work beyond machine energy saving so that we reduce cooling energy as well as energy used by the interconnect. Most datacenters do excessive cooling in order to avoid hotspots (areas in the machine room which are at a much higher temperature than other parts of the room). We are working on a runtime system which uses Dynamic Voltage and Frequency Scaling (DVFS) in order to minimize the occurrence of hotspots by keeping core temperatures in check. While doing so, one of our schemes reduces the timing penalty associated with using just DVFS by doing chare migration in order to load balance the application. Our results show that we can save considerable cooling energy using this temperature aware load balancing. Part of our recent research is exploring the possibility of load balancing chares in a way that we place 'less-frequency-sensitive' chares on hotter cores so that we can further reduce DVFS induced slowdown.
People
Papers/Talks
14-19
2014
[Paper]
Position Paper: Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability [MODSIM 2014]
14-15
2014
[Paper]
Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget [SC 2014]
14-02
2014
[Paper]
Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]
13-50
2013
[Talk]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-25
2013
[Paper]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-20
2013
[Paper]
Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems [Cluster 2013]
| Osman Sarood | Akhil Langer | Laxmikant Kale | Barry Rountree | Bronis Supinski
13-10
2013
[Talk]
Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]
13-09
2013
[Paper]
Toward Runtime Power Management of Exascale Networks by On/Off Control of Links [HPPAC 2013]
12-43
2012
[Talk]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-37
2012
[Paper]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-28
2012
[Paper]
Efficient ‘Cool Down’ of Parallel Applications [PASA 2012]
12-27
2012
[Paper]
Cloud Friendly Load Balancing for HPC Applications: Preliminary Work [CloudTech-HPC 2012]
12-20
2012
[Paper]
‘Cool’ Load Balancing for High Performance Computing Data Centers [IEEE TC 2012]
12-10
2012
[Talk]
Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs [ISPASS 2012]
11-18
2011
[Paper]
A ‘Cool’ Load Balancer for Parallel Applications [SC 2011]
11-10
2011
[Paper]
Temperature Aware Load Balancing for Parallel Applications: Preliminary Work [HPPAC 2011]