Live Webcast 15th Annual Charm++ Workshop

-->
Techniques for Efficient High Performance Computing in the Cloud
Thesis 2014
Publication Type: PhD Thesis
Repository URL:
Abstract
The advantages of pay-as-you-go-model, elasticity, and the flexibility and customization offered by virtualization make cloud computing an attractive option for meeting the needs of some High Performance Computing (HPC) users, especially those with emerging or sporadic demands. Computing or Infrastructure as a service model in cloud has tremendous potential of spreading the outreach of HPC to wider scientific and industrial community. We hypothesize that current clouds are suitable for some HPC applications not all applications, and for those applications, clouds can be more cost-effective compared to typical dedicated HPC platforms using intelligent scheduling of applications to platforms in cloud. Through comprehensive performance evaluation and analysis, we find that there are gaps between the characteristic traits of many HPC applications and existing cloud environments. The poor interconnect and I/O performance in cloud, network virtualization overhead, HPC-agnostic cloud schedulers, and the inherent heterogeneity and multi-tenancy in cloud are some bottlenecks for efficient HPC in cloud. Our philosophy for bridging the divide between HPC and clouds is to a) use a complementary approach of making clouds HPC-aware and HPC cloud-aware, b) consider also the unique opportunities off erred by cloud for HPC, such as virtual machine (VM) consolidation and elasticity, besides addressing the challenges posed by clouds, and c) consider views of both, HPC users and cloud providers, who sometimes have conflicting objectives: users must see tangible bene ts (in cost or performance) while cloud providers must be able to run a pro table business. With this philosophy, the techniques presented in this thesis, viz. HPC-aware cloud scheduling and VM placement, cloud-aware load balancing for HPC applications, and parallel runtime for enabling dynamically shrinking or expanding parallel jobs, significantly improve HPC performance and cloud resource utilization for HPC in cloud. We believe that our research will help users gain con fidence in the capabilities of cloud for HPC, and enable cloud providers to run a more pro table business.
TextRef
People
Research Areas